measuring running time inside loop

HI
I want to measure the running time of function run() in two different parameters A and M. If I call run('A') alone once then run('M') alone once, run('M') is faster. However, when I use a loop to run them the running time is different, run('A') becomes faster!
Do I need to separate the two calls inside the loop? or pause for some time before executing run('M') ? How I can I do that in c++? I think that calling them inside the loop in this way leads to inaccurate run time measurement, but I don't know how to fix this?

void run(char c)
{
if(c=='A')
{
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
// do some calculations
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time3);
time1=diff(time2,time3);
cout<<(time1.tv_nsec)*1e-6 ;
}
if(c=='M')
{
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
// do other different calculations
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time3);
time1=diff(time2,time3);
cout<<(time1.tv_nsec)*1e-6 ;
}
}

void main()
{
int numberOfScenarios=100;
for(int i=1; i<= numberOfScenarios; i++)
{
run('A');
run('M');
}
}
What exactly are you trying to measure?

The first time you call run(), it's probably in main memory.
But for such a small amount of code, all subsequent runs are going to be fetching instructions from L1 cache.
https://stackoverflow.com/questions/14707803/line-size-of-l1-and-l2-caches

You could try flushing the cache between tests.
https://man7.org/linux/man-pages/man2/cacheflush.2.html


Normally, you would calculate an average for a given case.
1
2
3
4
5
6
7
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
for(int i=1; i<= numberOfScenarios; i++)
{
   run('M');
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time3);
time1=diff(time2,time3) / numberOfScenarios;


// do some calculations
and
// do other different calculations

are actually call for functions that run different robotic path planning algorithm, but I wrote them this way for simplicity. I want to compare the running time of algorithm A and M.

I put them in one loop because I want to send them the same parameters(specific for robot path planning), and then write the running time of A and M in the same Excel sheet so I can analyze and compare the results.

void main()
{
int numberOfScenarios=100;
for(int i=1; i<= numberOfScenarios; i++)
{
run('A');
//write the run time in excel sheet
run('M');
//write the run time in the same excel sheet
}
}

Last edited on
So why not
1
2
3
4
5
6
7
8
9
10
11
12
int numberOfScenarios=100;
for(int i=1; i<= numberOfScenarios; i++)
{
    run('A');
    //write the run time in excel sheet
}

for(int i=1; i<= numberOfScenarios; i++)
{
    run('M');
    //write the run time in the same excel sheet
}

It's an easy thing to do to rearrange the results file however you want.

> If I call run('A') alone once then run('M') alone once, run('M') is faster.
> However, when I use a loop to run them the running time is different, run('A') becomes faster!
Do you have actual numbers to show us?

How does
 
run('A'); run('M');

compare to
 
run('M'); run('A');

compare to
 
run('A'); run('M'); run('A');

compare to
 
run('M'); run('A'); run('M');


Have you tried any of the cache invalidate suggestions?
Thank you for your help

If I run('A') multiple times it gives ==> 0.9, 0.3, 0.5 ....
If I run('M') multiple times it gives ==> 0.2, 0.5, 0.8 ....
so I think I need to calculate the average

Have you tried any of the cache invalidate suggestions?

could you please give me a short example in my code?
@EmanCS,
You know, it might be simpler if you used the __TIME__ macro. I'm not quite sure how you would implement that into your function, but its something you might consider.
Example:
1
2
3
4
5
6
7
#include <iostream>
int main (int argc, char const *argv[])
{
 // Outputs the exact time program was compiled
      std::cout << __TIME__ << std::endl;
      return 0;
}
Output:
15:51:18


In the Preprocessor Directives tutorial on this website, it defines the __TIME__ macro as:
__TIME__ A string literal in the form "hh:mm:ss" containing the time at which the compilation process began.
The keyword being "time of compilation".
So you run it a few hours later, it still prints "15:51:18"
You run it tomorrow, it shows "15:51:18"
You run it 5 years later, it shows "15:51:18"

__DATE__ and __TIME__ are useful in your program's help section, so when you get a bug report, you know exactly which version of the s/w they're talking about.
@salem c,
Well, ok, with my code editor (TextMate), it has a command cmd-R where you can compile and run the program. It doesn't allow inputs, but I use it to test for errors in the code. Every time you use cmd-R, it recompiles and runs it, so you get a different time each time (no pun intended). Sorry about that, my mistake. But, in TextMate, it also will show me how long it took the program to run.
An alternative to flushing the cache is to run it once without timing it to let it get pushed into memory.

you may also want to set the priority of the main thread to realtime/highest possible setting, so it can't be interrupted by random operating system stuff.

you can also try timing it with <chrono> tools.

there will always be some variation in the answers, but you can reduce the noise.

So here's some monkey wrenches:

1) System Caching - there are levels of speed that a program can operate at based on where it sits in your computer. The level that it sits at can change for many reasons, but generally because it is running for the first time, or because another program took priority and pushed you out of the faster level. If it sits in the hard drive and needs to be retrieved it will take a long time. If it sits in RAM, it will be quicker, but still seam sluggish when you are hoping for an optimized result. If it sits in L1 L2 or L3 cache which actually resides on top of the CPU, then things are going to be very quick. It's nearly impossible to say which level your data is sitting at at any given time on all computers.

2) Compilers implement things in different ways, so code compiled on MingW will work differently than code compiled on Cygwin.

2.5) Operating Systems work in different ways, so code run on Windows will have different results than those running on MAC and or Linux. Many functions called in the C++ libraries are actually routed to system calls in some way.

3) Different platforms (ARM vs x86_64) - I think you can guess it from 2 and 2.5... Different implementations taking different routes to activity.

In general it's fine to test the speeds of one function vs another when it's just for your own use, but don't use that result as a rule. It gets messy if you expect the results to be the same for everyone. You're best option is understanding BigO, do your research, and just try to do things as efficiently as you can. Don't get sucked down the wrong optimization rabbit hole.

https://www.youtube.com/watch?v=p8u_k2LIZyo <- how researching general computer hardware and standards, minor calculus, and using the most basic tools can produce impressive results.
Last edited on
^^^ it depends, of course.
1) caching matters (a LOT) but if you measure cpu clock cycles instead of time, you can get a sense of the work done.

and the last point:
sometimes code runs on just one platform, eg console gaming, embedded systems. There it is not only OK to tweak for the system at hand but encouraged. If its a more general purpose tool that will run on multiple OS, the above is critical to avoid making a mess, though of course you can #def a few versions for the biggest OS you target to get the best copy for each one for each compiled program (this is really annoying, it needs to be very important code to bother with all that).

hardware has largely bypassed fast code. in 1990, python would have been DOA as a language. Today, with home cpus fast closing in on terraflop capacity, it can be tolerated. You can use bubble sort on a million records and the user won't notice; when I was learning it we left it overnight to run a few thousand. Truly, as said above, understanding bigO and basic algorithm design is sufficient for the overwhelming majority of work.
Registered users can post here. Sign in or register to post.