measure execution time - always zero

Pages: 12

I realized today that one can explicitly turn off (gcc) compiler optimizations by adding the -O0 flag, i.e. compile with g++ -O0 -o name.exe source.cpp . I used to think that no optimization was the default behaviour...

I just tested my code by turning off optimization.

On my laptop (on which both loops were evaluated even with optimizations turned on), I get a different fraction of the run times of the two loops than in the case with activated optimizations.
On the cluster, I do get a proper nonzero runtime for the second loop now. However, it is also significantly different to the laptop results.

The influence of hardware and compiler on my evaluated runtimes is way too strong.
edit: I just realized that I did something wrong here. Need to rerun the experiments.

I also tried this approach (after turning on optimizations again):

jonnin wrote:
if you want the loop to run, do something slightly different. eg U+= instead of = and print U at the end so it isn't discarded. the timing will be good enough to get a sense of the work done.

I also added a trivial change to the input parameter of the functions, such as doubling of one of the float inputs in each iteration. I added these changes to the other loop as well.

While this indeed forces the loops to run, on my laptop, the resulting fraction was significantly different to the case without these changes. I believe this is due to the fact that the relative additional cost of the few additional operations compared to the cost of the function evaluation is different in the two loops, so that the resulting fraction will also be different... This approach would probably only work when the functions in the loop are so expensive that anything else is completely negligible. In my small scale tests, this is not the case...

Seems like this is the last thing to try:

Instead of measuring the costs of the force/potential functions, I simply enter the execution time of method A as an input argument for method B. While the method B goes through its iterations, I use the chrono clocks to measure the elapsed time in each iteration, and stop the iterations when the total elapsed time reaches the given admissible execution time.

But since I have experienced the influence of compilers / hardware first-hand, I don't think that this will be more consistent.

If one machine arrives at the result that method A is twice as expensive as method B, but another machine says that it is three times as expensive, then there is no point to this...

Last edited on

Peter87 (11264)

I used to think that no optimization was the default behaviour...

I'm pretty sure it is, although if you use an IDE they might enable some optimizations by default ("debug mode" probably uses fewer optimizations compared to "release mode").

_{https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-O0}

-O0
     Reduce compilation time and make debugging produce the expected results. This is the default.

Last edited on

jonnin (11497)

If one machine arrives at the result that method A is twice as expensive as method B, but another machine says that it is three times as expensive, then there is no point to this...

some high performance code uses #defines to swap the code around depending on the detected or selected target system. Its perfectly valid to do that if it matters enough to have it as optimal as possible on multiple machine types.

PhysicsIsFun (297)

Thanks guys, I appreciate your ideas.

I've spoken to my professor now (I am doing a PhD in applied maths). He said I should get rid of this problem and simply count the FLOPS in the routines get_noisy_force_GM( ) and U_pot_GM() by hand lol.

This will be a convincing enough rough estimate for the mathematical research community...
Although, whilst I'm typing this, I realize some FLOPS like additions are obviously cheaper than, say, exponentiation, so not sure how legit this idea is. Anyway, he knows what the research folks care about, so I will just follow his advice for the time being. Maybe I open another thread about counting FLOPS lol.

On another note: What would happen if one were to do this in Python? Without a compiler, would one get more consistent estimates? Or would the fact that Python (Numpy etc.), under the hood, is still run on C++, lead to the same issues?

Last edited on

Topic archived. No new replies allowed.

Pages: 12

C++

Forum

measure execution time - always zero