Eigen memory management issues

I am having some memory performance issues with my Eigen code. For example, using task manager to monitor memory usage I notice that the code uses a lot of memory (more than 10 GBs) which gets worse when increasing the size of Eigen tensors obviously.

I have tried few things to enhance the performance of my code in general, such as the following:

1. Use optimization flags. This ultimately gives me speed up but does not solve the memory performance issues:
 
-ffast-math -fno-math-errno -DNDEBUG -march=native -Ofast

2. Avoid passing "large arguments" by value instead by reference, for example:
 
void TensorProduct(Eigen::Tensor<std::complex<double>, 3>& Tensor1, Eigen::Tensor<double, 3>& Tensor2, Eigen::Tensor<std::complex<double>, 3>& Product){


3. Avoid creation of temporary objects since I am using dynamic size matrices. This was achieved by setting up a break point in check_that_malloc_is_allowed() in Eigen/src/Core/util/Memory.hthat's triggered when a temporary is created.

I also read somewhere here that maybe I can manage memory through use of Eigen::Map instead. However, I have used this with Eigen matrix and not Tensor before, so I am wondering if there's a way to implement this so that it would help with my memory performance issues. For example, this is how my Eigen tensors are being initialized in my code:
1
2
3
4
5
6
7
8
9
10
static const int nx = 64;  
static const int ny = 64; 
static const int nz = 64;

Eigen::Tensor<std::complex<double>, 3> Product(nx,ny,nz);  
Product.setZero();
Eigen::Tensor<std::complex<double>, 3> Tensor1(nx,ny,nz);  
Tensor1.setZero();
Eigen::Tensor<double, 3> Tensor2(nx,ny,nz); 
Tensor2.setZero();

Obviously, this is a small part of my code. The full code is much larger and more complex.

NOTE: A realistic working example of my code can be found on my GitHub:
https://github.com/JamieAlM/Eigen_Optimization

Is this an issue in particular with Eigen library? Is there no way to manage or enhance the memory performance in my ''Eigen'' code? Usually, this may indicate a memory leak when using c++ arrays however, this is not a possibility here so I am not sure how to fix this. Any tips would be much appreciated. Thanks.

Last edited on
I'm seeing dozens of 128x128x128 structures. That can easily balloon your memory usage, since each of those needs at least 16 MiB. I don't think this is enough to reach 10 GiB, but you also say this isn't your actual code. Unless you're allocating more objects than you need, I don't see how to possibly optimize this. Nothing you've said suggests there's a bug, it just sounds like the problem you're solving requires a lot of memory. What makes you think this memory usage is abnormal?
The code on Github segfaults on a read in r2cfft3d within the memcpy call:
memcpy(input_array, rArr.data(), (nx*ny*nz) * sizeof(fftw_complex))
The problem is that rArr contains real numbers but input_array contains complex numbers.
@helios

I don't think this is enough to reach 10 GiB, but you also say this isn't your actual code.

Yes, this isn't the actual code, but it's a good approximation and I am using it currently to test things. I might be confused, but when I run this code my memory usage drops from 22/24 GBs to below 13/12 GBs. Am I misreading this?

What makes you think this memory usage is abnormal?


I was simply comparing to a code I had written that uses c++ arrays which is a very naive way of looking at the memory performance here. I just thought that there has to be a better way to handle large problem with Eigen library. I could run this with a supercomputer, but because of how the memory usage "spikes" so high made me consider a possible memory leak or a general issue with the memory performance. Thanks!
@mbozzi
The code on Github segfaults on a read in r2cfft3d within the memcpy call:
memcpy(input_array, rArr.data(), (nx*ny*nz) * sizeof(fftw_complex))
The problem is that rArr contains real numbers but input_array contains complex numbers.


mm I am not sure what is the
segfaults on a read
error, but from documentation this fftw routine is defined as the following:
1
2
3
fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
                           fftw_complex *in, fftw_complex *out,
                           int sign, unsigned flags);

so it takes a complex input and returns complex output after performing the chosen transformation (i.e. FFTW_FORWARD)

There could be a problem with the way I have written this function, but at least I know it returns "correct" FFT values when compared to like MATLAB for example.
When you have memory leaks you don't see spikes, you see an upwards trend in memory usage, even as nested scopes terminate. Things that should have been deallocated, aren't. A memory usage spike suggests your functions are allocating a ton of memory for their local variables, and then those variables are being allowed to go out of scope, which is consistent with what I'm seeing in the source.

Something I'm noticing is that some of these functions are absolutely gigantic, and that just about everything is being allocated in the bottom scope. Are you limiting your objects to the minimum scope possible?
For example, if you have
1
2
3
4
T x = foo();
T y = bar(x);
T z = baz();
T a = snafu(y, z);
After the call to bar(), x is no longer needed, and it will continue to hang around in memory until the current function returns. You could rewrite this code into
1
2
3
4
5
6
7
T y;
{
    T x = foo();
    y = bar(x);
}
T z = baz();
T a = snafu(y, z);
or, more succinctly,
1
2
3
T y = bar(foo());
T z = baz();
T a = snafu(y, z);
By trimming down the required lifetime of your objects you're requiring your program to keep fewer of them alive at the same time, thus reducing memory usage. Depending on the program, this can in certain cases have a dramatic effect in the memory requirements, sometimes asymptotically.
"Segfaults on a read" means your code attempts to read from a memory location that the system forbids it from accessing.

Most of the time this means the program is accessing an array out of bounds. In this case the out-of-bounds access happens on the inside of memcpy, since the source of the copy rArr.data() is too small; it is not guaranteed to contain (nx*ny*nz) * sizeof(fftw_complex) bytes, because it's an array of double not fftw_complex.
Last edited on
Topic archived. No new replies allowed.