do
{
if (..)
{
return_val= performsomething (parm1,parm2..parm5);
}
}
- now, the performsomething (...) is written like below:
performsomething (x1,x2,...)
{
if (A_req)
{
performA();
}
else if (B_req)
{
performB();
}
}
now, if it is seen closely, for every bit the function performsomething () is getting called.
I want to stop the performance overhead of these many calls. I want to refactor the method performsomething() like it will check the conditions A_req? or B_req?. Accordingly, it will call performA() or performB(). I want to make performsomething() as 'inline' and call the underlying functions as needed.
Please help me how to code efficiently to achieve the above requirement.
it is difficult to have an array of bits. Most systems can't handle below a byte as the smallest thing. I think vector<bool> was optimized to act like an array of bits sort of, but it may be less efficient than below (you have to access each bit one by one via [] which costs, whereas below you get 64 bit blocks per [] invocation).
If it will fit, a 64 bit integer IS an array of bits, though. And you can do logic on it directly, and efficiently. If you really dig into it you can often do a great many operations in a single cpu cycle by choosing the right statement against the right value.
I would have to guess that the best way to refactor this is to use integers to hold the bits, and if you need more than 64, give yourself a vector of 64 bit ints big enough to support what you are doing.
the compiler will inline if possible for you if you have optimize on.
If it decides not to inline, you can see if your compiler supports something like __forceinline from Microsoft/VS. If it does not support that, you can use #include "codefile" where codefile contains the code you want to inline, and force fit it, or you can use a macro if the function is super simple, to force it to inline. Inline can actually cause slower execution on some systems in some scenarios. Inline is in the realm of voodoo when it comes to trying to 2nd guess the compiler and making it faster by hand. It may also be that you can just inline it... by unrolling the loop yourself and putting redundant code in the function. Most of these suggestions are terrible and to be avoided if the code is not absolutely critical for execution speed.
if you are bit-twiddling, it may be that inline assembly language is an option for performance also. Your CPU can do a few things that are tricky to convince C to do in one statement, but are natural to the CPU, one example is endian flipping.