[ 28.612876] readahead-collector: finished
[84838.147324] blochsim[5915]: segfault at 80 ip 00000030d56b7a60 sp 00007fffe7a71f18 error 4 in libstdc++.so.6.0.14[30d5600000+e8000]
[84877.255251] blochsim[5943]: segfault at 80 ip 00000030d56b7a60 sp 00007fff2cac7528 error 4 in libstdc++.so.6.0.14[30d5600000+e8000]
[84888.845668] blochsim[5970]: segfault at 80 ip 00000030d56b7a60 sp 00007fff5efa6048 error 4 in libstdc++.so.6.0.14[30d5600000+e8000]
[84910.222659] blochsim[5998]: segfault at 80 ip 00000030d56b7a60 sp 00007fff99c57948 error 4 in libstdc++.so.6.0.14[30d5600000+e8000]
[85000.521287] blochsim[6034]: segfault at 80 ip 00000030d56b7a60 sp 00007fff3b13a698 error 4 in libstdc++.so.6.0.14[30d5600000+e8000]
[85094.520389] blochsim[6064]: segfault at 80 ip 00000030d56b7a60 sp 00007ffff3027da8 error 4 in libstdc++.so.6.0.14[30d5600000+e8000]
[85383.402836] blochsim[6318] general protection ip:30caf3ef6c sp:7fbcc24fecc8 error:0
[85383.402841] blochsim[6319] general protection ip:30caf3ef6c sp:7fbcc1cfdcc8 error:0
[85383.402849] blochsim[6317]: segfault at 7fbcc3359130 ip 00000030caf3ef6c sp 00007fbcc2cffcc8 error 6 in libc-2.13.so[30cae00000+191000] in libc-2.13.so[30cae00000+191000]
[85383.402856]
[85383.402861] in libc-2.13.so[30cae00000+191000]
[85484.457215] blochsim[6407]: segfault at 8 ip 00000000004109f0 sp 00007ffff2656e20 error 6 in blochsim[400000+22000]
[85484.457237] blochsim[6409]: segfault at c8 ip 00000000004109f0 sp 00007fbdcdf998f0 error 6 in blochsim[400000+22000]
[85484.457258] blochsim[6410]: segfault at 128 ip 00000000004109f0 sp 00007fbdcd7988f0 error 6 in blochsim[400000+22000]
[85484.457274] blochsim[6408]: segfault at 68 ip 00000000004109f0 sp 00007fbdce79a8f0 error 6 in blochsim[400000+22000]
[86108.767975] blochsim[6842]: segfault at 68 ip 0000000000410020 sp 00007f17ee8fb8f0 error 6
[86108.767981] blochsim[6841]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fff6e24e760 error 4 in blochsim[400000+23000]
[86108.767989] in blochsim[400000+23000]
[86171.264609] blochsim[6872]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007ffff17d9fd0 error 4 in blochsim[400000+23000]
[86171.264622] blochsim[6873]: segfault at 68 ip 0000000000410120 sp 00007f531004b8f0 error 6
[86171.264629] in blochsim[400000+23000]
[86197.490224] blochsim[6903]: segfault at 68 ip 0000000000410020 sp 00007fbd1d4038f0 error 6
[86197.490233] blochsim[6902]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fff6d5f7310 error 4 in blochsim[400000+23000]
[86197.490242] in blochsim[400000+23000]
[86277.177470] blochsim[6938]: segfault at 68 ip 00000000004106a0 sp 00007f96c837c8f0 error 6 in blochsim[400000+23000]
[86277.177489] blochsim[6937]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fff3b97a830 error 4 in blochsim[400000+23000]
[87004.159332] blochsim[7181]: segfault at 8 ip 000000000040bde9 sp 00007fda78a18dc0 error 4
[87004.159340] blochsim[7180]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fff6d7c3890 error 4 in blochsim[400000+23000]
[87004.159350] in blochsim[400000+23000]
[87035.849004] blochsim[7209]: segfault at 0 ip 000000000040bde9 sp 00007fff58793840 error 4 in blochsim[400000+23000]
[87035.849031] blochsim[7210]: segfault at 8 ip 000000000040bde9 sp 00007f6d5fa75dc0 error 4 in blochsim[400000+23000]
[87123.615338] blochsim[7244]: segfault at 8 ip 000000000040bde9 sp 00007f2ba6998dc0 error 4
[87123.615344] blochsim[7243]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fffdf467650 error 4 in blochsim[400000+23000]
[87123.615351] in blochsim[400000+23000]
[87150.668015] blochsim[7273]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fffa1829b70 error 4
[87150.668026] blochsim[7274]: segfault at 8 ip 000000000040be38 sp 00007f3b2fae0dc0 error 4 in blochsim[400000+23000]
[87150.668132] in blochsim[400000+23000]
[87167.229195] blochsim[7302]: segfault at 8 ip 000000000040bde9 sp 00007f49d2d57dc0 error 4 in blochsim[400000+23000]
[87167.229211] blochsim[7301]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fff75f1ca70 error 4 in blochsim[400000+23000]
[87181.492398] blochsim[7330]: segfault at 8 ip 000000000040bde9 sp 00007fecde6dfdc0 error 4 in blochsim[400000+23000]
[87181.492417] blochsim[7329]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fffed79b860 error 4 in blochsim[400000+23000]
[87204.430914] blochsim[7357]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fff733bc480 error 4 in blochsim[400000+23000]
[87204.430926] blochsim[7358]: segfault at 8 ip 000000000040be38 sp 00007f1acb75adc0 error 4
[87204.430933] in blochsim[400000+23000]
[87231.826316] blochsim[7385]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fffefc51f30 error 4 in blochsim[400000+23000]
[87290.080334] blochsim[7415]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fffa2153610 error 4 in blochsim[400000+23000]
[87303.677111] blochsim[7443]: segfault at ffffffffffa47468 ip 0000000000403bd3 sp 00007fffd5f295c0 error 4 in blochsim[400000+23000]
[309067.058348] blochsim[11433] trap divide error ip:40d1f6 sp:7fff47021220 error:0 in blochsim[400000+24000]
From that we can see that you got some segmentation faults in your process about 1 day after the computer start with a different PID each time, probably in a development phase.
Then after about 3.5 days, you got a divide error somewhere in your process (probably a divizion by 0).It doesn't tell if it is in your code or in a library code however.
The divisions in the code you posted don't seem to be the problem, maybe in GetFrequency? But i don't see how a division by zero would cause a bad alloc exception so maybe it is not related at all
I believe it's a compiler problem related to the operating system. File-memory mapping is a very nasty function and makes a lot of problems... I don't think my code has a problem. Especially that, like I said, it worked when I reduced the size of the mapped part of the file!!! which is basically nothing related to the functionality of the program!!!
The allocation problem happens when defining a vector< vector<double> >, and assigning it to the output return of another function.
I could finally resolve the problem. It's a bug in many linux systems.
So the problem was, that my class mmap_cpp was re-using a function mmap_cpp::openfile();. This function involves unmapping the current map, closing the previously opened file, and reopening the new file, and mapping again.
It's definitely not effective to do it this way (I fixed it by adding a function mmap_cpp::remap()), nevertheless, the system shouldn't cause such exceptions.
I'm not sure how this error can be reported. This error showed up with different names under many Linux operating systems:
Ubuntu: Unable to open file, openfile simply returns -1 at non-specific points.
Mint: Unable to open file, open file simply returns -1 at non-specific points.
Fedora: Segmentation fault at some unpredictable specific point.
Cent OS: std::bad_alloc exception at some unpredictable specific point.
Note: error at specific point means that the system throws the error at the same point when the program is re-run many times.
All these operating systems threw these problems with the same code. Isn't this an obvious bug that has to be reported?
Because every operating system is giving a different error that has nothing to do with the behavior of the code. I mean, how could we get this different errors just because we reopen the same file many times rather than remapping it? does it make sense?
It's possible if your code is relying on undefined behavior, either at the language level or at the system level. The fact that different kernel versions are crashing your code at random points, or that the same kernel crashes your code with memory-related errors on the same point at random, is consistent with memory corruption by the application code.
I'm not saying that it's impossible to find a bug in the Linux kernel. I'm saying it's just much, much, much more likely that you're making a mistake than there being a bug in code thousands of people use and review every day. If you're going to report a bug, at least be completely sure that your code isn't buggy by running it through Valgrind, or try to reproduce the bug in the smallest code possible, so that you can have more solid evidence than "it crashes consistently". Otherwise you'll just get ignored.
Libraries on different systems are implemented differently, compiled with different options/etc. I think it is perfectly reasonable that a low level error (such as a memory problem) could show up differently on different systems.
You haven't posted all of your code, so I can't really analyze it. I could be talking garbage, but my instinct is that you are not calling munmap before closing the file, and you are in fact running out of process address space, not memory.
I have showed Valgring results already at the beginning of these posts. Please check the result! it says my program manages memory perfectly. I'm very careful with memory management, since I use templates and std containers.
Under very strict conditions, the code has worked on Cent OS (explained in a previous post). I can't really get the right indications out of all this. I mean, all I do is mapping and unmapping and opening and closing files, which is all done automatically with no memory leaks, and which Valgrind has agreed to... how could I be doing a mistake? debuggers show errors at stupid places, like in an assignment of vectors (which doesn't make sense at all), and some times segmentation faults at resizing a vector! How would you handle such an error??? Even ddd (the debugger) failed in catching it!!
I'm just suggesting reporting this error because I'm out of options!!!! what could I do else to make sure the code has a bug? and btw, where can I report this? anyone has experience in such reporting, and whether it would go easily?
debuggers show errors at stupid places, like in an assignment of vectors (which doesn't make sense at all), and some times segmentation faults at resizing a vector!
These point to heap corruption even more strongly. Are you running Valgrind with just leak detection, or are you telling it to also watch for buffer overflows and other kinds of memory errors?
EDIT: Just so you know the kind of hard to find bug you may be facing here: one time I was getting crashes like you're getting now. They were all over the place; when opening files, when allocating memory, when freeing memory. There was no pattern. After running the program through Valgrind many times and watching its behavior very carefully, calling the debugger at just the right time revealed I was using an uninitialized variable as a switch expression, which wreak all sorts of havoc with the allocations that happened inside and propagated undefined behavior non-deterministically. This is why I'm almost completely sure it's a bug in your code.