strange memory leak

Pages: 12
Hi all,

I've written some code that is exhibiting some very strange behavior. It is basically an implementation of Prim's algorithm on a matrix of nodes to produce a minimal spanning tree. Each node can be connected right, up, left or down.

Basically, I have two vectors, in and frontier, which correspond to the in and frontier sets in the algorithm. The basic structure of the main loop of the function is:

1
2
3
4
5
while (!frontier.empty()) {
    pick a random node from frontier
    remove it from frontier and add it to in
    add all of its unvisited neighbors to frontier
}


Or something like that.

The algorithm works just fine, and produces the correct results. However I'm having a very odd memory leak. When I set a breakpoint at the start of the above while loop, and move forward an iteration at a time, while simultaneously looking at the process memory, it sometimes grows and sometimes does not, inexplicably. This is even more perplexing than you'd think: I'm doing absolutely no manual dynamic memory allocation throughout this method. I'm handling it ALL with vectors. The in and frontier vectors are represented by vector<MazeNode *> objects, where MazeNode is the class dedicated to a single node in my graph. Both of these vectors have pointers to MazeNodes, but the nodes themselves already exist in memory, they are not being allocated in this method.

It may be somewhat hard to follow but here is the code for this method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
void Maze2d::prims(int r, int c) {

    std::vector<MazeNode *> in;
    std::vector<MazeNode *> frontier;

    // Mark the first node as visited
    nodes[r][c].visited = true;

    // Add it to "in"
    in.push_back(&nodes[r][c]);

    // Get pointers to every neighbor of the starting node that has not been visited
    std::vector<MazeNode *> unvisitedNeighbors = getUnvisitedNeighbors(r, c);
    // For each one (there will be at most 4), add a pointer to it to "frontier"
    for(unsigned int i=0; i < unvisitedNeighbors.size(); i++) {
        MazeNode *new_frontier = unvisitedNeighbors[i];
        frontier.push_back(new_frontier);
        new_frontier->frontier_visited = true;
    }

    // Throughout this loop, we basically pick a random node from "frontier,"
    // then join it with a random "in" node adjacent to it.
    while(!frontier.empty()) {

	// Get a random frontier node
        int new_in_index = rand() % frontier.size();
        MazeNode *new_in = frontier[new_in_index];
        int new_r = new_in->row, new_c = new_in->col;

	// Mark it as visited, add it to in, erase it from frontier
        new_in->visited = true;
        in.push_back(new_in);
        frontier.erase(frontier.begin() + new_in_index);

	// Get an array of all visited and unvisited neighbors
        std::vector<MazeNode *> unvisitedNeighbors = getUnvisitedNeighbors(new_r, new_c);
        std::vector<MazeNode *> visitedNeighbors = getVisitedNeighbors(new_r, new_c);

	// Pick a random visited neighbor of the current node (i.e. a node that is "in")
        int rand_visited_neighbor_index = rand() % visitedNeighbors.size();
        MazeNode *rand_visited_neighbor = visitedNeighbors[rand_visited_neighbor_index];

	// Join the two nodes in the graph
        join(new_in, rand_visited_neighbor);

	// Add each unvisited neighbor of the newly added node to "frontier"
        for(unsigned int i=0; i < unvisitedNeighbors.size(); i++) {
            MazeNode *new_frontier = unvisitedNeighbors[i];
            // Only add the node to frontier if it is not already in frontier
            if (!new_frontier->frontier_visited) {
                frontier.push_back(new_frontier);
                new_frontier->frontier_visited = true;
            }
        }

    }
}
Last edited on
I don't know how large the vectors you're using are, but remember that a vector's internal array will never shrink even if you clear() or resize(0) the vector, and that not every call to std::vector::push_back() is followed by a resize of the internal array (http://www.cplusplus.com/forum/beginner/26208/#msg139683 ). The only way to truly make sure that all memory for a vector is freed is by destructing it.
Last edited on
That's a really helpful thought. I tried revising the function using std::vector::resize, so the opening lines now look like this:

1
2
3
4
5
6
    std::vector<MazeNode *> in;
    std::vector<MazeNode *> frontier;

    // the biggest in or frontier will ever be is rows * cols
    in.reserve(rows * cols);
    frontier.reserve(rows * cols);


However, the memory still grows in the same way it did before. I've also done the same thing for the visitedNeighbors and unvisitedNeighbors methods, so even the tiny amount of memory that they need (an array of four pointers, which is destructed at the end of each loop iteration) is taken care of. I can see that this may not be a "leak" and perhaps somewhere, some object is simply reassessing the amount of memory it needs. But I want to know for sure it is not a leak (obviously). I've even checked that the in array never gets resized by setting a conditional breakpoint that gets hit if in.capacity() ever changes; it doesn't.

It seems likely to me that your idea is related to what is happening, but I couldn't figure out where it's occurring. Is there anything else in that method that could be causing similar behavior? I know it's hard to analyze this since it is kind of taken out of context, but the few other methods that are called here are extremely tame.

Anyway, thanks helios and anyone else who has any ideas. :)
Update: I came up with the (perhaps obvious) idea to simply loop the method with while(true) and watch the memory; while it does oscillate (between about 690 and 740 Kb), it never gets above a certain point; so I don't have a memory leak. However, I am still curious about what exactly is happening in terms of memory allocation.
closed account (EzwRko23)
Memory fragmentation.

As your application runs for a long time, the heap gets fragmented - there are holes of free memory, but they are too small to allocate from and the allocator allocates from new, fresh memory pages. At some time the situation stabilizes. This is typical behaviour for programs running without compacting GC. How much memory is wasted is very application dependent - sometimes it is just 5%, in extreme cases it can be 5 times as much or even more. However, RAM is cheap - the much worse problem of fragmentation is not that the memory is wasted, but that your application can get much slower with time because of poor memory locality.

There are 2 solutions for this:
1. try a different allocator (in Linux you can do this by LD_PRELOAD), e.g. Google's Hoard allocator
2. write a dedicated, fine tuned allocator for some of your objects (but beware of hard to find bugs - such things are extremely difficult to debug).

Oh, there is also a third solution: use a language with good compacting GC - for programs that are heavy on allocation of **small** objects this can be a huge performance win - much faster allocation, good memory locality, no fragmentation and almost 0-cost of deallocation.
Last edited on
Interesting - not something I'm terribly anxious to dive into, considering I'm an intermediate-level C++ programmer at best. But it's good to know a little bit about what's going on behind the scenes. For now, I can live with the slight performance hit; it's a hobby programming project anyway :)
closed account (EzwRko23)
Oh, there is also one more thing I forgot. In case your application allocates lots of small objects and then deallocates random 99% of them, probably **no** memory would be returned to the OS at all. OS allocates memory in whole pages, usually 4 kB large. Even a single 4B object can hold a 4 kB page - so in this extreme case you can get up to 1000x memory overhead. Because allocators cannot move objects between pages, such almost free pages will be reserved until your applicaiton exits or frees the remaining 1% of objects.

More on this problem here: http://blog.pavlov.net/2007/11/10/memory-fragmentation/
Here's a quick test that can be performed to check for memory leaks. As with all testing, it can only prove the existence of problems, not their absence.
You suspect f() has a memory leak. You run it in a loop for some time to see how its memory behaves. It's preferable to choose parameters that will test most code paths. If the amount of allocated memory stays largely the same, the function is probably not leaking memory. If it appears to rise at a somewhat constant rate, then it's definitely leaking memory.
When performing this test, it's important that the function is almost pure. No state should be kept in global variables, files, or anything else. The only way to tell the function what to do has to be through its parameters.

http://en.wikipedia.org/wiki/Pure_function (Only in the first sense.)

EDIT:
Even a single 4B object can hold a 4 kB page - so in this extreme case you can get up to 1000x memory overhead.
That is, assuming that the memset() implementation is completely retarded.
Last edited on
closed account (EzwRko23)
The fastest way to check for memory leaks is to run a program under valgrind - this is easy compared to tracking fragmentation problems. They are especially tricky, because they are not easily spottable when running just a single function. What causes problem is when allocations and deallocations are interleaved between various parts of code using memory in different ways and in different time (e.g. different lifetime of objects). Then you have the problem of Firefox.


That is, assuming that the memset() implementation is completely retarded.


memset() has nothing to do with this. This is a general property of all allocation schemes without compaction. An **almost-free** memory page cannot be returned back to OS, whatever is the memset implementation. An almost-free memory page is also useless for allocating objects larger than 4kB. Creating an almost-free page is easy - allocate a full page of small objects and then free all of them except one.
Last edited on
Thanks guys. Perhaps a good solution to this problem, for this particular method, is to simply take all the dynamic allocation outside the scope of the loop. This way, the actual allocation only occurs once for each vector involved, and inside the loop all we are doing is changing its contents. The visitedNeighbors and unvisitedNeighbors vectors only ever have a maximum of four elements, so it seems silly to keep reallocating them at each iteration. Perhaps an even BETTER solution would be to have an array outside the scope of the loop and use it, which would just use stack storage, so not only would all that allocating be avoided, but access to the variables themselves would be faster (I think?)
closed account (EzwRko23)
Great idea.
You're right when you say it's not possible to ask for less memory than a page from the OS, but not when you say memset() has no influence on how that memory is used.
Say I ask the system for 4 bytes of memory and it gives me 4K. If I'm stupid as hell, I'll take the first 4 bytes and throw away the rest. But if I'm smart, I'll keep the rest and save it for next time I need at most 4092 bytes.

Try running this program and see how fast memory usage grows:
1
2
3
4
5
6
7
8
9
10
11
//#include <windows.h>
//#include <unistd.h>

int main(){
	while (1){
		new int;
		//Sleep(10);
		//usleep(10000);
	}
	return 0;
}
If it grows in steps every few seconds, memset() is being smart and reusing pages.
closed account (EzwRko23)
1. This is not the situation I described.
2. Your analysis is oversimplified.

You must not ignore deallocations. Evgery allocator worksd perfectly in absence of deallocations, or for simple patterns of allocations/deallocations. I described a situation when you e.g. allocate 10000 pages with small objects, and then you delete most of these objects, but not all of them. You would expect most of memory should be returned back to the OS, but it won't happen. And then you have internal fragmentation overhead: you use just a small fraction of allocated memory. Memory overhead is huge and cache utilisation also suffers a lot.

So you must be very careful with applications that require huge amounts of memory just for a particular task. This memory might never be returned back after the task has finished (it can be reused, but not reclaimed). Even if your allocator is able to reuse this memory for the next big task, locality of reference will be terrible and you should expect lots of cache misses (and large objects allocations from the end of the heap - just like in Firefox case).
Last edited on
I've revised my algorithm slightly as per my idea above (see code below). It runs almost twice as fast now :)

As you can see, there are now no variables local to the while loop. However, the memory is still not constant, and I am almost positive that the in and frontier vectors are what's causing the problem, since what else could be happening? Everything else is allocated on the stack.

What bugs me about this is that I did include the lines
1
2
in.reserve(rows * cols);
frontier.reserve(rows * cols);

What exactly does this do? I was under the impression that it allocated an internal array of size rows*cols for each vector. The vector's size NEVER exceeds this bound (I made extra sure by setting a conditional breakpoint in my debugger), so I don't see why the internal array should ever change throughout the function. By my memory indicator, though, it does. I guess just having vector's around means it's possible that memory paging becomes an issue, even if in theory the memory should never change?

Anyway, let me know if you have any more insights. Your help so far has been invaluable.

EDIT: When I comment out all the push_back and erase calls, the memory stays constant. In theory, at least as far as I can tell, neither method should be doing any reallocation. But apparently they are. :(
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
void Maze2d::prims2(int r, int c) {
    int total_steps = 1;

    std::vector<MazeNode *> in;
    std::vector<MazeNode *> frontier;

    int unvisitedNeighbors[4];
    int visitedNeighbors[4];
    int numUnvisitedNeighbors = 0, numVisitedNeighbors = 0;
    int new_in_index = 0, new_r = 0, new_c = 0;
    int rand_visited_neighbor_index = 0;
    int i=0;
    MazeNode *new_in, *new_frontier, *rand_visited_neighbor;

    in.reserve(rows * cols);
    frontier.reserve(rows * cols);

    nodes[r][c].visited = true;
    in.push_back(&nodes[r][c]);

    numUnvisitedNeighbors = reGetUnvisitedNeighbors(r, c, unvisitedNeighbors);
    for(int i=0; i < numUnvisitedNeighbors; i++) {
        int dir = unvisitedNeighbors[i];
        MazeNode *new_frontier = nodeTo(r, c, dir);
        frontier.push_back(new_frontier);
        new_frontier->frontier_visited = true;
    }

    while(!frontier.empty()) {

        new_in_index = rand() % frontier.size();
        new_in = frontier[new_in_index];
        new_r = new_in->row;
        new_c = new_in->col;

        new_in->visited = true;
        in.push_back(new_in);
        frontier.erase(frontier.begin() + new_in_index);

        numUnvisitedNeighbors = reGetUnvisitedNeighbors(new_r, new_c, unvisitedNeighbors);
        numVisitedNeighbors = reGetVisitedNeighbors(new_r, new_c, visitedNeighbors);

        rand_visited_neighbor_index = rand() % numVisitedNeighbors;
        rand_visited_neighbor = nodeTo(new_r, new_c, visitedNeighbors[rand_visited_neighbor_index]);

        join(new_in, rand_visited_neighbor);

        for(i=0; i < numUnvisitedNeighbors; i++) {
            new_frontier = nodeTo(new_r, new_c, unvisitedNeighbors[i]);
            if (!new_frontier->frontier_visited) {
                frontier.push_back(new_frontier);
            }
            new_frontier->frontier_visited = true;
        }

        ++total_steps;
    }
}
Last edited on
That's all fine and good, but I wasn't arguing that memory fragmentation can't occur. I was arguing that the memset() implementation does have an influence on under what usage patterns it occurs. The example is overly simple on purpose.
By the way, thanks to xorebxebx for the valgrind recommendation, I had never heard of it! It's fantastic!

The vector thing is still puzzling me. I have no memory leaks in the function (verified by valgrind) but the memory is definitely growing with each iteration of the loop. See my most recent post for details.
closed account (EzwRko23)
And have you tried dumping the sizes of the vector? Maybe simply some of your vectors are growing although you think they should not?
Last edited on
I have tried that, with the following lines:

1
2
std::cout << "in.size() = " << in.size() << ", in.capacity() = " << in.capacity() << "\n";
std::cout << "frontier.size() = " << frontier.size() << ", frontier.capacity() = " << frontier.capacity() << "\n";


Both sizes change, but both capacities are exactly the same throughout the execution of the function.

I've even gone so far as to keep a pointer to the address of a random element of one of the vectors (first I did the first element, &in[0], then I used one somewhere in the middle), and I compare it to the actual address of that element at the beginning of each iteration, and it never changes! So the internal array doesn't seem to be being moved, but the memory is still growing with every few iterations. I'm absolutely perplexed.
Well, you are allocating new objects.
Where? My understanding was that all the actual allocation should be happening at the top of the function, but not inside the loop. The only thing I'm doing inside the loop is storing pointers in the vector's internal array. I believe you, of course, but can you elaborate?
Pages: 12