Using threads gives Segmentation fault

Hi,

I am working with a program that simulates the Single Model particle procedure. I am following this: https://github.com/davidhowey/SLIDE.

The issue happens at Degradation.cpp line 90:

void Calendar_one(const struct Model& M, const struct DEG_ID& degid, int cellType, int verbose, double V, double Ti, int Time, int mode, int timeCycleData, int timeCheck, struct checkUpProcedure proc, string name)

which is called via a thread:

std::thread cal1 (Calendar_one,M, degid, cellType, verbose, V, Ti, Time, mode, timeCycleData, timeCheck, proc, name); // make a new thread and simulate this

cal1.join();


I ran GDB and I get the following trace:


Starting program: /home/julio/eclipse-workspace-new/SLIDE/build/default/slide
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.
start simulations
The CalendarAgeing function uses a non-optimal number of threads. Ideally, you should use 7 cores.
0_2-0_0-0_2-3_1_ Batch 1
calendar 1, T=5, 100%
[New LWP 10893]

Thread 2 "slide" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 10893]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

(gdb) where
#0  0x00005555555690e1 in Calendar_one (M=<error reading variable: Cannot access memory
 at address 0x7ffff70cfdc8>, 
    degid=<error reading variable: Cannot access memory at address 0x7ffff70cfdc0>, 
    cellType=<error reading variable: Cannot access memory at address 0x7ffff70cfdbc>, 
    verbose=<error reading variable: Cannot access memory at address 0x7ffff70cfdb8>, 
    V=<error reading variable: Cannot access memory at address 0x7ffff70cfdb0>, 
Ti=<error reading variable: Cannot access memory at address 0x7ffff70cfda8>, 
    Time=<error reading variable: Cannot access memory at address 0x7ffff70cfda4>, 
    mode=<error reading variable: Cannot access memory at address 0x7ffff70cfda0>, 
timeCycleData=3600, timeCheck=30, 
    proc=<error reading variable: Cannot access memory at address 0x7ffff70cfd98>, 
    name=<error reading variable: Cannot access memory at address 0x7ffff70cfd90>) 
at /home/julio/eclipse-workspace-new/SLIDE/src/Degradation.cpp:90

#1  0x000055555557a6a4 in std::__invoke_impl<void, void (*)(Model const&, DEG_ID const&, int, int, double, 
double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (
    __f=@0x5555555e53c0: 0x5555555690c7 <Calendar_one(Model const&, DEG_ID const&, int, int, double, 
    double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, 
    std::allocator<char> >)>) at /usr/include/c++/9/bits/invoke.h:60

#2  0x00005555555797a0 in std::__invoke<void (*)(Model const&, DEG_ID const&, int, int, double, double, 
int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (
    __fn=@0x5555555e53c0: 0x5555555690c7 <Calendar_one(Model const&, DEG_ID const&, int, int, double, 
    double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, 
    std::allocator<char> >)>) at /usr/include/c++/9/bits/invoke.h:95

#3  0x0000555555578ccf in std::thread::_Invoker<std::tuple<void (*)(Model const&, DEG_ID const&, int, int, 
double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_invoke<0ul, 1ul, 2ul, 
3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul> (this=0x5555555e4558) at /usr/include/c++/9/thread:244

#4  0x000055555557891f in std::thread::_Invoker<std::tuple<void (*)(Model const&, DEG_ID const&, int, int, 
double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator() 
(this=0x5555555e4558) at /usr/include/c++/9/thread:251

#5  0x000055555557888a in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(Model const&, 
DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, 
int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::
_M_run (this=0x5555555e4550)
    at /usr/include/c++/9/thread:195
#6  0x00007ffff7ea6d84 in ?? ()
#7  0x00007ffff7a4b700 in ?? ()
#8  0x0000000000000000 in ?? ()
(gdb) 





The crash happend at the begging of the function, so the issue cannot be within the code, it has to be somewhere in the passing parameters, but unfortunatelly I cannot see it.

However, when I remove the thread and just call the function, it works fine.

I am puzzle because I do not understand. I have checked many forums and I have followed most of the recommendations. I have increased the size of the stack to 32MB using ulimit (via rlimit). I know the amount of data passed is big but the original post does not warn about size of the structure.

The only thing I can think of is that the thread library is not compatible. Nonetheless, I have created a very simple call with print commands, and this works. It is just when calling the entire function.

Last edited on
Just to add more information about the Calendar_one function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
void Calendar_one(const struct Model& M, const struct DEG_ID& degid, int cellType, int verbose,
double V, double Ti, int Time, int mode, int timeCycleData, int timeCheck, 
struct checkUpProcedure proc, string name){
	/*
	 * Function which simulates one calendar ageing regime.
	 * It does little more than calling the corresponding function from Cycler.cpp
	 */
	
	// settings of the cycler
	double dt = 2;			
	if (Ti < 40)
		dt = 5;		

	// Make a cell, the type of the cell depending on the value of 'cellType'
	Cell c1;
	if (cellType ==0)
		c1 = Cell_KokamNMC (M, degid, verbose);		// a high power NMC 
	else if (cellType ==1)
		c1 = Cell_LGChemNMC (M, degid, verbose);	// a high energy NMC 
	else
		c1 = Cell_user(M, degid, verbose);			 
	// Make the cycler
	Cycler cycler(c1, name, verbose, timeCycleData);

	// Call the Calendar-function of the cycler. Wrap it in a try-catch to avoid fatal errors
	try{
		cycler.calendarAgeing(dt, V, Ti, Time, timeCheck, mode, proc);
	}
	catch(int err){
		cout<<"Calendar_one experienced error "<<err<<" during execution of "<<name<<", 
abort this test"<<endl << std::flush;
		if(err == 15){
			cout<<"Error 15 means that the cell had degraded too much to continue 
simulating. \n"	<<endl<<flush;
		}
	}
}

1
2
3
std::thread cal1 (Calendar_one,M, degid, cellType, verbose, V, Ti, Time, mode, timeCycleData, timeCheck, proc, name); // make a new thread and simulate this

cal1.join();
Does that mean you create a thread and in the very next line you call join()? If so why do you create a thread at all?

The problem is this const struct Model& M, const struct DEG_ID& degid when using a thread. You cannot pass directly a reference. You need std::ref(). See:

https://en.cppreference.com/w/cpp/thread/thread/thread
http://www.cplusplus.com/reference/functional/ref/?kw=ref

std::thread cal1 (Calendar_one,std::ref(M), std::ref(degid), cellType, verbose, V, Ti, Time, mode, timeCycleData, timeCheck, proc, name);
However, you need to make sure that this references ar valid as long as the thread exists.
Hi coder777

Thanks for responding to my cry for help! This is really appreciated

I just copy one of the entries of the code. In the real one, there are 7 calls to the function before I do join 7 times...

I will check your suggestion and come back to the Forum for update.
Hi coder777,

I applied the fix and unfortunatelly it did not work. However, I can see that the function is failing in the join call. I checked all the parameter passed to the funtion and I can see them via ddd.

CalendarAgeig (M=..., pref="0_2-0_0-0_2-3_1_", degid=..., cellType=0, verbose=1) at /home/julio/eclipse-workspace-new/SLIDE/src/Degradation.cpp:977

But when executing the next line, I get the segmentation fault.

Thread 2 "slide" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7a4b700 (LWP 4814)]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0x00005555555690e1 in Calendar_one (M=<error reading variable: 
Cannot access memory at address 0x7ffff70cfdc8>, 
degid=<error reading variable: Cannot access memory at address 0x7ffff70cfdc0>, 
cellType=<error reading variable: Cannot access memory at address 0x7ffff70cfdbc>, 
verbose=<error reading variable: Cannot access memory at address 0x7ffff70cfdb8>, 
V=<error reading variable: Cannot access memory at address 0x7ffff70cfdb0>, 
Ti=<error reading variable: Cannot access memory at address 0x7ffff70cfda8>, 
Time=<error reading variable: Cannot access memory at address 0x7ffff70cfda4>, 
mode=<error reading variable: Cannot access memory at address 0x7ffff70cfda0>,
 timeCycleData=3600, 
timeCheck=30, 
proc=<error reading variable: Cannot access memory at address 0x7ffff70cfd98>, 
name=<error reading variable: Cannot access memory at address 0x7ffff70cfd90>) 
at /home/julio/eclipse-workspace-new/SLIDE/src/Degradation.cpp:90

(gdb)
If you use gcc or clang you can compile with -fsanitize=thread and run it.
Maybe you get a more meaningful error msg.
Thanks thmm,

I will try this suggestion..
It is possible that the problem already occurs before you start the thread. It makes no sense that it cannot access even simple copied values such as mode.

What you can do is more debug output and/or comment out code as long as this happens and trying to find out the offending part.
However, if I remove the threads from the code and run each function, it works. It is happening when the join is called.

More interestingly, if I run simple functions, the threads work. This points me that the hw accepts threads.

When I ran AddressSanitiser, I get:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==11157==ERROR: AddressSanitizer: stack-overflow on address 0x7f01913ff590 (pc 0x5640906a24b1 bp 0x7f0191bfe5b0 sp 0x7f01913ff590 T1)
#0 0x5640906a24b0 in Calendar_one(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /home/julio/eclipse-workspace-new/SLIDE/src/Degradation.cpp:90
#1 0x5640906b7d2a in void std::__invoke_impl<void, void (*)(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__invoke_other, void (*&&)(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), Model&&, DEG_ID&&, int&&, int&&, double&&, double&&, int&&, int&&, int&&, int&&, checkUpProcedure&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (/home/julio/eclipse-workspace-new/SLIDE/build/default/slide+0x50d2a)
#2 0x5640906b7629 in std::__invoke_result<void (*)(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::type std::__invoke<void (*)(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(void (*&&)(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), Model&&, DEG_ID&&, int&&, int&&, double&&, double&&, int&&, int&&, int&&, int&&, checkUpProcedure&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (/home/julio/eclipse-workspace-new/SLIDE/build/default/slide+0x50629)
#3 0x5640906b71ba in void std::thread::_Invoker<std::tuple<void (*)(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_invoke<0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul>(std::_Index_tuple<0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul>) (/home/julio/eclipse-workspace-new/SLIDE/build/default/slide+0x501ba)
#4 0x5640906b6ff1 in std::thread::_Invoker<std::tuple<void (*)(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator()() (/home/julio/eclipse-workspace-new/SLIDE/build/default/slide+0x4fff1)
#5 0x5640906b6f59 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), Model, DEG_ID, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::_M_run() (/home/julio/eclipse-workspace-new/SLIDE/build/default/slide+0x4ff59)
#6 0x7f01956a0d83 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd6d83)
#7 0x7f0195241608 in start_thread /build/glibc-ZN95T4/glibc-2.31/nptl/pthread_create.c:477
#8 0x7f0195390292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

SUMMARY: AddressSanitizer: stack-overflow /home/julio/eclipse-workspace-new/SLIDE/src/Degradation.cpp:90 in Calendar_one(Model const&, DEG_ID const&, int, int, double, double, int, int, int, int, checkUpProcedure, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
Thread T1 created by T0 here:
#0 0x7f01957e5805 in pthread_create (/lib/x86_64-linux-gnu/libasan.so.5+0x3a805)
#1 0x7f01956a1048 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd7048)
#2 0x5640906aaa7b in CalendarAgeig(Model const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, DEG_ID const&, int, int) /home/julio/eclipse-workspace-new/SLIDE/src/Degradation.cpp:967
#3 0x5640906824e6 in main /home/julio/eclipse-workspace-new/SLIDE/src/main.cpp:187
#4 0x7f01952950b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

==11157==ABORTING
1
2
julio@julio:~/eclipse-workspace-new/SLIDE/build/default$ ulimit
unlimited


Last edited on
Hi coder777

I found my problem and I will document it here for future reference.

The issue is with the stack. I have added code to increase the stack using get and set rlimit. I thought this will propagate in eclipse each time the thread is called. But my test showed that the threads in eclipse starts another process and this cause that the defined stack value returns to the original values.

To fix the issue, in the terminal where eclipse will run, set the ulimit -S -s <to your value> and then run eclipse. Then, this value will be used during the whole session. Eclipse will reset every time, but the new value will now be the value set in the session.

Just a note. Setting this value using ulimit will compromise other programs. If you run into a problem that says resources temporary unavailable, then, close the eclipse session and close the terminal. This will free the resources.
Topic archived. No new replies allowed.