I'm working on an assignment which requires parsing a very large text file. Switching from using scanners, next(), nextInt(), etc, to nextLine(), and then parsing the lines myself resulted in almost double the memory consumption of the entire program (almost 200 extra Mb).
So I tracked down the issue, and it turns out that String.substring() is the cause.
The fix is this,
1 2 3 4 5 6
String line;
String s;
//strangely as subString is supposed to return a new String anyways
s = new String(line.subString(a, b));
//instead of
s = line.subString(a,b);
And as bad as it is that the Java standard API has such an egregious memory leak issue, what's worse is that it's not even documented in Java documentation; it's not even incorrect usage according to the standard.
It really makes you think about the consequences of a language where you aren't controlling the memory usage.
And what's especially shameful is that this bug was reported in 2001. According to their report in their bugDataBase, they decided on a fix as of 2012, 11 years later. However, I'm using javaSE-1.7.0_21 and the issue has definitely not been solved for this implementation.
> And as bad as it is that the Java standard API has such an egregious memory leak issue,
> what's worse is that it's not even documented in Java documentation
I remember reading somewhere that this has been fixed in Java 7; that String.subString implementation has been changed to behave as it does in C++; return a new String (instead of a tuple <reference to containing string, offset, count> ).
This actually seems like a new bug, after checking the source code in the standard library for the methods use everything seems to check out at first glance, it's probably a new bug.
Here is what i have found after some more experimentation:
Not using new consistently results in between 370 to 390 MB total memory use.
Using new usually results in between 190 to 210 MB, however, about 1 out of 6 times, it end up still using 370 to 390 MB.
Not using new, but adding a print statement for ever scanned line consistently results in about 210 - 250 MB total usage.
Strange. I guess it is probably not the same bug. I guess it's probably the garbage collector deciding if it doesn't want to collect yet or not. The memory is released when the application is terminated, but if it does retain an extra 190 or so MB at the start, it holds it for the duration of the programs execution.