Optimizing Speed

Pages: 123
I'm just looking for general ways to optimize the speed of a program. I can't post the code, so don't ask. I just want any generic rules I can go by.

Also, how big a difference would be made by programming in assembly?
One generic rule is to click the box within the compiler gui that says "optimize for speed" as opposed to "optimized for size". Another generic rule is to follow your companies coding standards carefully as many of the issues have already been thought through by someone.
Where would this be for Dev-C++ 4.9.9.2. I know that it's really old, but that's what I have. I'd be willing to try a different compiler if you tell me that a newer one will produce more efficient machine code.

BTW, I'm not in a company, I'm just a 16 year old kid interested in programming. I'm studying independently, so I don't have teachers or co-workers to get help from.

Edit: I did find something that says 'optimization', and choosing 'best optimization' reduced my program's speed by about 50%.
Last edited on
Dev-C++ is a pretty horrible IDE (integrated development environment).

I like Visual Studio 2010, and a free version is available (it's made by Microsoft), from: http://www.microsoft.com/express/Downloads/


as far as Generic performance improvement ideas go:

1. Always check to see if a standard library function or container already exists that will do what you're needing it to. Chances are you can't write one using C or C++ that is faster than the standard library implementation. An example is the Visual Studio 2010 std::sort() function. (The STL algorithms are your friend)
http://www.cplusplus.com/reference/algorithm/sort/


2. Common Sense: Don't create variables inside of a loop, they can probably exist just fine outside of it, and it won't require allocating space and deleting it every iteration.

3. Compilers typically support a "Release" mode and a "Debug" mode, the Debug mode has checks and features to make debugging easier, the release mode will significantly increase performance in some cases.

4. Have common programming sense.

5. Have me optimize your code for you.

6. Use better algorithms when possible.

Ad 1. Only if you are a mediocre programmer. If you are a good programmer, it is quite easy to outperform STL. STL is designed to be general purpose. If you are smart, you can create a solution exactly tailored to your needs and you can leverage some properties of your problem, that are not used in STL (e.g. that the array you are sorting is already partially sorted), and if you don't screw something up, it should be faster.

Ad 2. This should be a last-resort method. Good compilers should do such microoptimisations for you.

Ad 6. This should be the first step.

Additionally:
Step 0: Design your architecture properly and optimisaiton will be easy.

Last edited on
closed account (z05DSL3A)
Use a profiler so you know what to optimise.
closed account (Lv0f92yv)
Also, if your solution involves recursion, consider replacing it with an iterative solution for release. Iterative solutions are almost always faster than recursive solutions, since calling functions requires use of the stack for each function call, and a lot of calls add up. Pushing function data and popping it over and over again takes HUGE amounts of time compared to a loop.

Some compilers do this conversion for you, in some cases. Others do not. Generally, try to avoid recursion when trying for efficiency unless the only iterative solution involves simulating the stack (which you probably cannot do as fast as it is done naturally).

Registers are super-fast, but machines only have a handful of them (platform specific). It might be a good idea to use registers when reading data from the same variable happens over and over again many times. This in theory will reduce circumvent the time it takes to access RAM and retrieve the value. Registers exist inside the processor, eliminating the bus transfer to get/put values. (Someone please correct me if I am confused here)

Choose data structures/storage mechanisms that are most efficient for what you want to do - an ordered list doesn't make sense if you don't care about order, a hash table doesn't make sense if you want order and don't plan on doing many lookups/insertions really fast. (Common sense if your solid with data structures and big O notation).
Last edited on
whats a profiler?
closed account (z05DSL3A)
wtf wrote:
whats a profiler?
wikipedia wrote:
A (code) profiler is a performance analysis tool that, most commonly, measures only the frequency and duration of function calls, but there are other specific types of profilers (e.g. memory profilers) in addition to more comprehensive profilers, capable of gathering extensive performance data.
http://en.wikipedia.org/wiki/Gprof

Registers are super-fast, but machines only have a handful of them (platform specific). It might be a good idea to use registers when reading data from the same variable happens over and over again many times. This in theory will reduce circumvent the time it takes to access RAM and retrieve the value. Registers exist inside the processor, eliminating the bus transfer to get/put values.


True, but you don't have any control over this if you are programming in C or C++.
The "register" keyword is mostly treated like a comment by compilers - they do register allocation by themselves. You would have to dive into assembly to take control of what is stored in the register, and what not, however I doubt you would get better results than the best C compilers available. In case you decide to do such microoptimisations, I would rather recommend you to use the "restrict" keyword to mark non-aliasing pointers and help the compiler do better register allocation and avoid some memory writes.


whats a profiler?


A tool that measures how much amount of CPU time is spent inside various parts of your code, and also how many times your functions are called.
Last edited on
If you are smart, you can create a solution exactly tailored to your needs and you can leverage some properties of your problem, that are not used in STL (e.g. that the array you are sorting is already partially sorted), and if you don't screw something up, it should be faster.


You can, but just beware that you will then have to use time debugging and optimizing your own work, which may or may not be what you want to do.
If you are a kid learning to program, I am not so sure that speed optimization is the first thing for you to worry about. I'd first be worried about learning the syntax of the language and writing a functional program first. There could be many programs where speed optimization makes no difference at all. In other words you could optimize a program to run faster but it would make no noticeable difference to the end user of the software. In fact I would worry that you'll spend too much time worrying about it and end up writing unreadable code.
http://en.wikipedia.org/wiki/Optimization_%28computer_science%29#When_to_optimize

As far as improving upon the STL, well hopefully you can prove that should you try it. If you can't prove that your creation is faster then an STL component then there isn't much point in trying to do it. Another consideration is that if you can't make a significant improvement then what is the point? Why reinvent the wheel for only a small performance improvement? A lot of times you can improve performance simply by using the containers correctly and by choosing the correct container for any particular job. In other words, if it isn't broken, don't fix it. Unless you can develop a set of measurable performance goals there isn't much point in writing your own container classes.
@kempofigher:
That may be true for most containers, but isn't it true that the STL doesn't even have a standard matrix class? I actually was able to make significant improvements to the AP matrix class that I had been using, thanks to the help of Duos... or (was it Disch... ) I'm talking about 7x-10x as fast for constructors and select member functions that are typically called repetitively. I have yet to attempt to try to optimize the whole class yet, I'll save that for another day. But I'd be skeptical about any blanket statements about the futility of improving the STL... Not saying one way or the other, because I'm not an expert, but I wouldn't be surprised at all if it can be done.
That's because a Matrix is a math thing and the STL is for programming. If you want Matrices and stuff, either get an external lib or use something like Fortran/Matlab/something made for calculations.
kempofighter said:
If you are a kid learning to program, I am not so sure that speed optimization is the first thing for you to worry about. I'd first be worried about learning the syntax of the language and writing a functional program first.


I am a just a kid learning to program, and I started learning C++ about two months ago, but I've actually written several functional programs. In some of these speed optimization would be very important, particularly one I am currently working on that is related to a math/programming contest.

On the subject of minimizing code within loops, I have realized that I have several variables being declared for every iteration. I'll try to fix these.

I'd like to know where I could get a profiler as well.

taylorc8 said:
Have me optimize your code for you.


I can't post the program I'm working on because it is related to a contest, so how about a different one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>
#include <math.h>
#include <fstream.h>

unsigned long long Prime_Test(unsigned long long number);

using namespace std;

int main ()
  {
     ofstream outfile;
     outfile.open("Prime List2.dat", ios::app);
     outfile << 2 << ' ';
     if (!outfile)
        return 0;
     unsigned long long n;
     for (n=3; n!=99999999; n+=2)
     {
        if (n==0)
           break;
        if (Prime_Test (n))
           outfile << n << ' ';
     }
     outfile.close();
     return 0;
  }
   
unsigned long long Prime_Test(unsigned long long number)
  {
    unsigned long long i;
    if (number%2==0)
      return 0;
    for (i=3; i<= sqrt(number); i+=2)
    {
      if (number%i==0)
        return 0;
     }
    return (1);
   }  


It's pretty short, but if there's anything you can demonstrate then please do so.
Try using a better compiler.
#include <fstream.h>
shouldn't even compile.

One of the main bonuses to using C++ is having your operating system's API available to you. You can write native code if your operating system supports it to multi-thread your application.

Another benefit to using C++ is when writing native code you can tailor the program to the hardware the program will be running on.

Having me optimize it was mainly a joke, also. Though, it does seem as though you need a better algorithm for the code you have posted. Try implementing the Sieve of Eratosthenes with a vector<bool> or bitset.



Ad 1. Only if you are a mediocre programmer. If you are a good programmer, it is quite easy to outperform STL. STL is designed to be general purpose. If you are smart, you can create a solution exactly tailored to your needs and you can leverage some properties of your problem, that are not used in STL (e.g. that the array you are sorting is already partially sorted), and if you don't screw something up, it should be faster.


To me that sounds like a load of crap, but something to consider.

"Always check to see if a standard library function or container already exists that will do what you're needing it to."

That's almost a golden rule software engineers will use, it's right next to first checking to see if you have existing code that already does what you want. As for using your own method because you think you can make it faster, the algorithms are just as the person said, general purpose. They even have this nice iterator interface so you can sort only a range if you wish, or it's partially sorted.

Only if you are a mediocre programmer. If you are a good programmer, it is quite easy to outperform STL.


I'm not going to argue, but I will say that's going to cut into some development time. Oh, and I do wonder if you've ever done it before.
closed account (Lv0f92yv)
rapidcoder

Thanks for the clarification. I thought using the register keyword would tell the compiler to place the contents of this variable in a register - though I agree it probably wouldn't be a noticeable improvement even if that were the case (and you did actually get something put into a register).
There are ways you could write container classes that outperform STL.

But is it worth it? I'd say no. The time you'd have to spend writing and testing classes would likely outweigh any potential speed gain. Especially if you want to make your classes as feature-rich.

Plus many/most/all of the potential optimizations would have to be accomplished by changing the API and making it less user friendly, which means any code that uses the optimized classes will likely be more error prone and harder to follow.


So would you rather spend days coding/testing new classes and writing obfuscated code to use those classes? Or will you take the 15 nanosecond penalty hit for using the STL?
No one said about rolling your own STL like containers. I meant rather replacing that 10 lines of a tight loop that uses an STL vector to multiply 2 matrices with a carefully optimised, specialised solution using arrays and SSE4 intrinsics. :) STL should be a default, because it is general. But you often pay for genericity.

How would you multiply two matrices of complex numbers? The simple and general solution is to take some generic matrix class from some generic linear algebra library, create a type e.g. matrix<complex> and multiply them. But oops, you are doing 4 multiplicaitons now, while the optimal algorithm for this requires only 3 (per each pair of numbers, so this is not 15 nanoseconds for large data structures).

What if you want to multiply and add matrices? Sometimes creating your own specialised function multiply_and_add will be much faster than using a library-defined general purpose multiply and then add on large data structures (memory non-uniformness is your enemy).

There is nothing wrong with having complex code somewhere, as long as the complexity of it is well isolated.
Last edited on
@firedraco:

The AP (Advanced Placement from the "College Board", who produce and administer the SAT exams as well) Matrix classes do not include any mathematical functions. The functions to which I was referring that I was able to optimize were all constructors, indexers or assignment operators.

The point that I was trying to make was that if I, a novice, can optimize code (albeit with much help from the people of this forum :) produced by a professional organization, then surely it should be possible for experienced programmers to optimize the STL.

Sure there's going to be opportunity costs involved, but that goes without saying. No one is going to optimize any code without spending time that they could have used doing something else, whether it be working on a different project, attending school, camping or fishing. The same goes for writing any piece of code. Avoiding something because there is a chance you might not suceed always results in failure.

Notice I'm not saying anything about the quality of the STL. I don't know enough to be a judge on that. But I would second some of the opinions expressed on here, that with specific goals and benchmarks in place, attempting to optimize any piece of code could be a rational decision.

For example, when I was reworking my sudoku program I decided that 7x faster implementation for a constructor of a class which may get called hundreds of thousands of times during a call to a function that generates permutations of a given sudoku until a desired difficulty range is reached, is more than an ideal candidate for optimization. (Heck, I would have optimized had I been able to get 3x or even 2x as fast). I did this even before I attempted to fiddle with my algorithms for 2 reasons.

1. To gurantee that I wouldn't get lazy later and forget about it.

2. I had a clear and reasonably guranteed result in sight. My algorithm, admitedly slow, was proven to work, and I wasn't positive I could improve on it.

I would have probably spent as much time on reworking the algorithms (if not more, because I probably would have wound up having to rework them multiple times) and I'm reasonably sure that when I do get around to it (I'm so burnt out right now i'm taking a short break and working on other projects) I will be able to speed up my final program even more significantly. Primarily due to the fact the optimized code portions such as constructors only take up like 1 line of code in a larger loop, and the algorithms could in theory reduce the entire loop, so 2x as fast for an algorithm might end up being faster than 7x as fast for an optimized container, but the bottom line is 7x as fast simply CANNOT be ignored.

This thread has given me some aditional ideas, such as not declaring variables inside a loop whenever possible. I had not thought of that before.
Pages: 123