how to optimise a very simple function

Profiling revealed that 25% of my entire program is spent calling the following function:

1
2
3
4
// cycle-step function
void horizon_value::step(double output) {
	cummulative_error += abs(output - solution_data_horizon_output[t++]);
}


I have made as many logical optimisations as I can think of, but now I was hoping someone could tell me how I could maintain the features of the function while possibly improving execution time somewhat. Would it be possible to improve this function with some some inline assembly? How?

Roughly 40% of the time spent executing this function is spent in abs(), so I could potentially make some gains there. Note that all variables are doubles. I am using abs instead of fabs because I was told that they are identical under vc++. Correct me if I'm wrong
Last edited on
You have turned on optimizations before profiling I hope.


25% doesn't say much. Is it much, is it little? It depends on what else your program do. Do you think your program is too slow or are you just optimizing for the fun of it?
closed account (o1vk4iN6)
Do as Peter87 says, fiddle with your compiler settings, perhaps try different compilers as well.

I don't see how abs() would be slow, just how much time is your program taking to execute? If you look at the assembly and see specifically what line is taking the most time in abs. I know my experiences with profilers sometimes they are off, I had one tell me a pop instruction was taking 20 seconds. I'd say use intel's if you have a Intel processor and I'm not sure if AMD has one but I'd think they would.

If I remember correctly, the first bit in a double determines if it's negative or not, so you could just do some bit operations to make sure it's not set. This might reduce it's portability though so I'd just put it in an inline function or something if I were you and if you're really worried abs() is causing a performance hit.
Last edited on
Ok thanks, that gives me something to work with. This function was sampled as often as another function running in serial with this one, and its size is in the order of kFlops, so I'll do some fiddling with the settings and get back to you...
Peter: 25% is high given that I have another function with significantly more work taking up roughly the same amount of time... that seems a little illogical to me.
When a function that simple is taking so much time it is bad news. Generally performance of a code is measured by running it for long time ( maybe hours) since operating systems handle different jobs at the same time. The same code ran for several runs will take different times that vary slightly. This can be in order of seconds- so running a program for few seconds means nothing. If you have a huge array no wonder all the time is spent inside abs accessing the memory. I will suggest the following:

1. How much time your program runs for? If it is for only a few seconds let it run inside a loop so that at least 5-10 minutes are spent inside your program. Look at what the profiler says.

2. The function has two flops and a memory access. See if you can access memory from cache, it may cause significant change in performance. If possible get the cache misses from the profiler. If indeed your program took a long time inside this function it is probably a lot of cache misses.

3. Generally try performance experiments on a free machine that is not loaded with other intensive programs.

Hope this helps.
Indranil,
I hadn't thought about cache access.. That's something I'm going to look into (never given it much thought until now). The profiler was running for about 2 hours, so the sample size is certainly enough (the profile result file was 200GB).
Cheers,
Arman
Make sure the function can be inlined. If you don't use LTO, that means it must go into the header if it is called from other translation units.
what's LTO?
LTO = Link Time Optimization
Topic archived. No new replies allowed.