- Forum
- General C++ Programming
- Optimization with Eigen Tensors and Open

I have an older post about optimization with Eigen matrices where answers were EXTREMELY helpful. The thing is the code I used there is a 2D code and the optimization flags work pretty nicely. Now, I have extended my code to 3D using Eigen tensors and some of the quick optimization suggestions do not work as good. For example, the biggest domain I could run on my computer without it "crashing" is a

My CPU has 6 cores, which seems okay enough. So I am trying to use a simple parallelization using OpenMP to speed up my code. Is there a simple way using OpenMP that would help speeding up the code so I can actually increase the size of my domain twice as much? Is that doable or should I consider running this on a cluster or supercomputer? I will include the full 3D code example for those who are interested to take a look. I will say be careful running it on a personal laptop. Thanks!

Link:

https://www.zipshare.com/download/eyJhcmNoaXZlSWQiOiI3NzZkYzdjYi01YjI2LTRhN2UtODg0MS1jOTYyZDgxZDk0YTMiLCJlbWFpbCI6ImFzdHJvbHVqeUBnbWFpbC5jb20ifQ==

`nx, ny, nz = 64;`

. Ideally, I would like something at least twice as much so `nx, ny, nz = 128; //or 256 `

. My CPU has 6 cores, which seems okay enough. So I am trying to use a simple parallelization using OpenMP to speed up my code. Is there a simple way using OpenMP that would help speeding up the code so I can actually increase the size of my domain twice as much? Is that doable or should I consider running this on a cluster or supercomputer? I will include the full 3D code example for those who are interested to take a look. I will say be careful running it on a personal laptop. Thanks!

Link:

https://www.zipshare.com/download/eyJhcmNoaXZlSWQiOiI3NzZkYzdjYi01YjI2LTRhN2UtODg0MS1jOTYyZDgxZDk0YTMiLCJlbWFpbCI6ImFzdHJvbHVqeUBnbWFpbC5jb20ifQ==

Do you realize that going from 64×64×64 to 128×128×128 means you will have 8 times as many elements.

Going all the way to 256×256×256 means you will have 64 times as many elements.

Going all the way to 256×256×256 means you will have 64 times as many elements.

256^3 applied to doubles is a little over a GB, so that is probably OK assuming that a 6 core machine has 16-32G ram.

openMP is for splitting work across a bunch of computers.

multi-threading is the usual answer for splitting it up on the same computer, eg 5 working threads and one master that is mostly idle (allowing the OS some space to work).

I don't know if MP can mimic threading or not, but its probably way overkill to use it that way.

so you need to decide the scope of the problem you are tying to solve, really.

For reference, a typical new computer has 20 or so cores and 32-64G ram and typically has at least 1 fast (SSD) drive. Before you move to multi computers or exotic hardware, consider how big a problem that can solve.

also note that some stuff can run on the graphics card, which has many, many cpus but they are limited in what they can do as is the space to do it in.

openMP is for splitting work across a bunch of computers.

multi-threading is the usual answer for splitting it up on the same computer, eg 5 working threads and one master that is mostly idle (allowing the OS some space to work).

I don't know if MP can mimic threading or not, but its probably way overkill to use it that way.

so you need to decide the scope of the problem you are tying to solve, really.

For reference, a typical new computer has 20 or so cores and 32-64G ram and typically has at least 1 fast (SSD) drive. Before you move to multi computers or exotic hardware, consider how big a problem that can solve.

also note that some stuff can run on the graphics card, which has many, many cpus but they are limited in what they can do as is the space to do it in.

I realized you can decrease memory pressure by using Eigen's `noalias` member functions where appropriate:

https://eigen.tuxfamily.org/dox/group__TopicAliasing.html

This is probably important because, as profiling showed in the other thread, the CPU spends at least 90% of its time waiting for data to move to and from memory.

Also the code I gave you earlier, creates threads in`potentialk`. Try using your original serial version of that function while playing with OpenMP, because I think it is likely to conflict.

https://eigen.tuxfamily.org/dox/group__TopicAliasing.html

This is probably important because, as profiling showed in the other thread, the CPU spends at least 90% of its time waiting for data to move to and from memory.

Also the code I gave you earlier, creates threads in

Last edited on

@mbozzi

Thanks, I am looking at it. Trying to test several things at a time unfortunately.

Thanks, I am looking at it. Trying to test several things at a time unfortunately.

Registered users can post here. Sign in or register to post.