how otimize code?

it's 'easy' create a function. but what make think is: what is the best for otimize the code? how we can win more CPU speed? how we can test the speed?
the same methods are compatible with another programming languages?
Last edited on
what is the best for otimize the code?

Turn on compiler optimizations.

If you're still having a performance problem then run your program through a profiler* to see where your program spends most of its time and then focus on trying to optimize those parts.

* On Linux you can use Callgrind [a Valgrind tool] + kCachegrind [to view the result].

The biggest savings can often be done in loops (especially nested loops). If you can improve the code inside the loop it can often have a big impact. If you can reduce the number of iterations that can also have a big impact.

Then there is a lot of small things that is always good to keep in mind while writing code. E.g. Avoid unnecessary copies, don't do more work than necessary, prefer std::vector over std::list, avoid pointers and dynamic memory allocations unless you have a good reason, etc. The more experienced you get the longer the list becomes. You won't even think of these as "optimizations". It will just become the normal way you write code.
Last edited on
This is where computer science becomes an art, really.

the first cuts:
- actually profile the code on the target plaform(s) and actually show that the code in question is not meeting its performance goal. You should have a goal for the time taken for some sized input / benchmark that you want to meet.
- be sure you are using the right algorithm for your problem. Your assembler optimized bubble sort still stinks.
- be sure you are not doing stupid things like creating and destroying large things inside loops, including hidden loops.
- make sure you are not copying things when not needed, including forgetting & on range based for loops, function parameters, etc.
- make sure you are threading the work if appropriate. Modern cpus have 20 cores, not using them is going 20 times slower than needed if the problem is suitable for threads.
- avoid unnecessary dynamic memory use. Mostly this mirrors the unnecessary copy or object create/destroy already said, but keep an eye out for similar issues.
- be sure your flags are set up properly for the program, -O3 for example, whole program optimization, inline settings, and so on are all compiler flags.
- switches try to use a form of lookup table, so switches are the better way to do conditions when speed is wanted, usually.

most of the above is before you even touch the code looking to do weird, 1985 old school crap. The old school stuff is often already done for you by the compiler anyway, and trying to second guess it is very difficult these days. If you did the above, and its still not good enough, there are uglys you can do to try to help things along, but you need to be SURE you are poking the right place before you try to tune it.
The kinds of old school stuff that still work... integer pow (the exponent is int, like square/cube) function > built in pow & similar bad math functions, factoring conditions (k-map), using (comparing) squared distances to remove sqrt calls, ... anything like that the compiler can't understand the logic of (the compiler does not understand distance formula, it can't factor out a root because its not the same answer you coded for), bitwise logic, bypassing conditions (x = (y==3)*4 + (y==5)*11 instead of if else), loop unrolling (compiler can but does miss some), etc...
Last edited on
jonnin wrote:
Modern cpus have 20 cores, not using them is going 20 times slower than needed if the problem is suitable for threads.

Even when they are, the choice of algorithm can make a big difference, as demonstrated by Sean Parent in: https://www.youtube.com/watch?v=zULU6Hhp42w
Threading can be very tricky, but its so often ignored by the 'how make faster' discussions and so critical to it for many problems.
on CodeBocks, GCC, i use:
1
2
3
#pragma GCC optimize("Ofast")
#pragma GCC target("avx,avx2,fma")
#include <iostream> 

seems more faster, but maybe i need more compiler options.
what the code do:
- draw a tiled image with vertical lines(using DIB's, Z and Math).
i use the GetTickCount() for calculate the FPS.
i get 35 FPS, the plane\wall have 1000W and 500H pixels.
it's normal only 35 FPS?
if i change the orign and destination line to use\convert the 3D to 2D it will win change the result? i will win more speed?
is possible convert these 'if' in 'case'?
if(p>=0 && (y<ImageHeight && y>0 ) && (x<ImageWidth && x>0) )
35 fps is very low. Most games, in 3d, run at 60 to 100 fps on a gaming computer.
2d is usually faster than 3d, yes.
cases can do ranges sort of. It just depends. why are you checking that anyway? Can you not ensure that your values are in range (usually you can for a fixed thing like an image width!).

you make nested, decision tree stuff into a switch/case.
a single if statement doesn't make sense, as you do it or not (binary) -- a switch is not as useful for most of those (instead, consider elimination of it as demonstrated).

but, you immediately jumped past all the first cut stuff and down into the 1980s. Slow down a bit there... :)

fps is not likely the issue. how many FPS do you get if you display the SAME image over and over, never changing it? that is what the hardware can do, so that is near your goal. How long does simply generating the image take (this is probably where all your time is spent!). I don't know what you are doing, but lets take some assumptions:
35/sec against 1000X500X3 bytes of generated data is over 52 megabytes/sec rate, just multiplying all that out. That is a fair bit of crunching. Are you hopefully making frame 1 on cpu 1, frame 2 on cpu 2, frame 3 on cpu 3, in parallel already? Is that possible?
can you offload anything here to the graphics card? Graphics optimization is a whole different set of skills.
Last edited on
If you're number-crunching pixel by pixel on the CPU, you're inevitably going to suffer in terms of FPS. Use the GPU. Use GPU APIs like OpenGL.
Ganado: for now i'm using CPU.. yes!!!
i can learn DirectX, but i'm learning with Math ;)

jonnin: are you tell me for i use Multithread for draw the image?(in 3D we use zoom, depending on distance)
i can share the entire code, but, on future, i can make the same errors. And it's big(but i can share the most important functions).
Last edited on
yes, try threading it, so that each image is partly generated even as the last one is showing up on the screen. See if that can punch it up to 40, 50 FPS or more?

I don't need to see your code yet.
Last edited on
i will create a new topic. then i can back here ;)
At 35 Hz there's a lot of opportunity to render scanlines in parallel in a thread pool and then syncing at each frame, rather than rendering only frames in parallel.
Last edited on
Just so that you know what you're getting yourself into: Multithreading is not an easy subject. I would even want to go so far as to say it's one of the most difficult things in programming. By all means, go and study it and learn it but take the time to do it properly. It's not the sort of thing where you can just try something out and see if it works. You need to actually know what you're doing.
Last edited on
Furthermore, parallel computation requires that you have multiple things that calculate at the same time (i.e. independently). Images quite likely do and that is why the modern GPU's have way more "cores" than CPU's.

About two decades now it has been possible to do arbitrary calculations in GPU;
no image is drawn in "GPGPU" -- results are loaded back to main memory.
Last edited on
Topic archived. No new replies allowed.