Inline function Advantages, Disadvantages, Performance and User Guidelines ?

Pages: 12
Inline function is the optimization technique used by the compilers. One can simply prepend inline keyword to function prototype to make a function inline. Inline function instruct compiler to insert complete body of the function wherever that function got used in code.

Advantages :-
1) It does not require function calling overhead.
2) It also save overhead of variables push/pop on the stack, while function calling.
3) It also save overhead of return call from a function.
4) It increases locality of reference by utilizing instruction cache.
5) After in-lining compiler can also apply intraprocedural optmization if specified. This is the most important one, in this way compiler can now focus on dead code elimination, can give more stress on branch prediction, induction variable elimination etc..

Disadvantages :-
1) May increase function size so that it may not fit on the cache, causing lots of cahce miss.
2) After in-lining function if variables number which are going to use register increases than they may create overhead on register variable resource utilization.
3) It may cause compilation overhead as if some body changes code inside inline function than all calling location will also be compiled.
4) If used in header file, it will make your header file size large and may also make it unreadable.
5) If somebody used too many inline function resultant in a larger code size than it may cause thrashing in memory. More and more number of page fault bringing down your program performance.
6) Its not useful for embeded system where large binary size is not preferred at all due to memory size constraints.

Performance : -
Now covering the topic which most the people are interested in the "Performance".
In most of the cases Inline function boost performance if used cautiously as it saves lots of overhead as discussed in our Advantages section above but as we have also discussed its disadvantages one need to be very cautious while using them. Today's modern compiler inline functions automatically, so no need to specify explicitly in most of the cases. Although placing inline keyword only gives compiler a hint that this function can be optimized by doing in-lining, its ultimately compiler decision to make it inline. Though there are ways to instruct compiler too, for making a function call inline like one can use __forceinline to instruct compiler to inline a function while working with microsoft visual c++. I suggest not to use this keyword until you are very sure about performance gain. Making a function inline may or may not give you performance boost, it all depends on your code flows too. Don't expect a magical performance boost by prepending inline keyword before a function to your code as most of the compiler nowadays does that automatically.

As we have seen inline function serves in terms of performacen but one has to use it with extreme cautions.

I have prepared a few guidelines for its use.
Uses Guidelines :-
1) Always use inline function when your are sure it will give performance.
2) Always prefer inline function over macros.
3) Don't inline function with larger code size, one should always inline small code size function to get performance.
4) If you want to inline a function in class, then prefer to use inkine keyword outside the class with the function definition.
5) In c++, by default member function declared and defined within class get linlined. So no use to specify for such cases.
6) Your function will not be inlined in case there is differences between exception handling model. Like if caller function follows c++ structure handling and your inline function follows structured exception handling.
7) For recursive function most of the compiler would not do in-lining but microsoft visual c++ compiler provides a special pragma for it i.e. pragma inline_recursion(on) and once can also control its limit with pragma inline_depth.
8) If the function is virtual and its called virtually then it would not be inlined. So take care for such cases, same hold true for the use of function pointers.

For more such info please visit my technical blog :-
http://www.tajendrasengar.blogspot.com/2010/03/what-is-inline-function-in-cc.html

That's it from my side, I hope you enjoyed reading the post.
Last edited on
That's a decent article. As a rule of thumb, only inline functions that are 3-8 lines long.
Where did 3 and 8 come from?
It should be said that profiling is an important step in performance assessment and improvement. Nor will inlining make up for poor algorithm design.
2) After in-lining function if variables number which are going to use register increases than they may create overhead on register variable resource utilization.
What? Seriously, I have no idea how to interpret this.

3) It may cause compilation overhead as if some body changes code inside inline function than all calling location will also be compiled.
Correction: any source that includes the header where the inlined function is defined will have to be recompiled if the function (or anything else in the header, for that matter) changes.

No mention of decreased CPU cache performance with increased binary size? For shame.
Last edited on
2) After in-lining function if variables number which are going to use register increases than they may create overhead on register variable resource utilization.


My guess: After inlining a function, the number of variables which are going to use registers could increase, possibly creating an overhead on register utilization.
Hello frnds, this is tajendra again.
In current discussion i think point (2) raised some doubt let me cover it in details:-

2) After in-lining function if variables number which are going to use register increases than they may create overhead on register variable resource utilization.


Details:- As we know after inlining a funtion whole source code get inserted at the calling point.
So the total variables going to be used also get increased. So the number of register going to be used for the variables will aslo get increased. So if after function inlining variable numbers increase drastically then it would surely cause an overhead on register utilization. To avoid such type of problem one should always inline function with great caution. For that you can find uses guidelines mentioned in my above article.

Also if you want to know how to improve performance for small local chunk allocation then check :-
[u][quote][b]http://www.tajendrasengar.blogspot.com/2010/02/how-to-allocate-memory-dynamical[/u]ly-on.html[/quote][/b]
Decrease in CPU performance is related directly to cache miss, which i have covered in my very first point of disadvantages section.
Also larger binary size impact shown with point 5 in my disadvantages section.

Please feel free to post if any more doubts.
Last edited on
after inlining a funtion whole source code get inserted at the calling point.
So the total variables going to be used also get increased. So the number of register going to be used for the variables will aslo get increased.
That's very flimsy logic. So you're saying that this code
1
2
3
f();
int a=f2();
int b=f3();
needs a CPU with more registers than this?
 
f();


which i have covered in my very first point of disadvantages section.
Alright, never mind, then.

5) If somebody used too many inline function resultant in a larger code size than it may cause thrashing in memory. More and more number of page fault bringing down your program performance.
By the time a program becomes so large that the OS needs to use swap space just to know what the program should do next, I think performance will be the least of your concerns.
Last edited on
Lets have a look at my point with more preciseness,
if we have a function f()

1
2
3
4
5
6
7
8
9
10
11
12
13

int f1()
{
//do something
//this function use 10 variables
}

int f()
{
//do something
//this function use 20 variables
f1(); //calling function f1()
}


So in above case function f() uses 20 variables at max, now if we make f1() to be inlined
then variable count of function f() would be 30. So more inlining results into more number of variable to be managed.
So like above example we can have much complicated situation too. Therefore we can say that:-
"The added variables from the inlined function may consume additional registers, and in an area where register pressure is already high this may force spilling, which causes additional RAM accesses."




I think you're confusing "CPU registers" and "stack space". If a function declares 40 variables, it's not necessarily going to use 40 registers. Mainly because the CPU may not even have that many registers. x86, for instance, has nowhere near that many. And we're not even taking into account types that don't fit in a register.

As for stack space, there's probably less cost when inlining than when calling, because the program doesn't need to save the CPU state (instruction pointer, stack pointer, etc.).
What he's trying to say is that the compiler's attempt at optimizing away memory accesses by storing variables
in registers may be defeated if the inline function has variables in it. (If the function were not inlined,
registers could be used, but once the function is inlined, there aren't enough registers available because the
"calling" function is already using them for its own optimized variables.
I suggest that you proof read your article and fix the typos. There are a few. Also for this type of article I would suggest that you refer explicitly to the C++ std as some of the statements that you have made are highly questionable and/or confusing.

1) Always use inline function when your are sure it will give performance.

I don't see how I could possibly know in all cases. I don't see how this is a useful guideline. It is also conflicting since you said that the inline keyword is a hint and that the compiler could reject the request anyway.

4) If you want to inline a function in class, then prefer to use inkine keyword outside the class with the function definition.

Why?

6) Its not useful for embeded system where large binary size is not preferred at all due to memory size constraints.

How does inlining increase binary size? You need to explain this. I've been writing embedded programs for years and have never heard or seen of any coding standard that suggests this. I would have thought the opposite is true since inlining eliminates some of the jump,push, and pop instructions that are required to make a function call. Please clarify that statement.
How does inlining increase binary size?

Yeah! How could copying the code from a function wherever it's called possibly increase binary size? If you call an inline function 100 times, then it copies the code for it into 100 times (with modifications, obviously).
1) Always use inline function when your are sure it will give performance.


Its actually a good point raised, how can we make sure that inlining a function will give performance.

As we have discussed that inline function got both advantages and disadvantages. Bottom line is we need to use it correctly.
So whenever you are making a function inline, you should be sure about its impact.
As if you forcibly make large code size function to be inlined, it will result in larger code size and may hit performance.

So, if you are targeting performance than you must profile your code to see its impact.
An immature inlining of function may affect performance.





4) If you want to inline a function in class, then prefer to use inline keyword outside the class with the function definition.


This i regards to be Best Practice.

Lets take a code snippet for it :-
1
2
3
4
5
6
7
 class Test{
 public:
   void func1();  //<----- best practice: Not to use inline keyword here
  ;
 
 inline void Foo::func1()  //<----- best practice: Put inline keyword here
 {  // do something }  


Reason favoring it :-
Suppose you want to share class Test, then other party is only interested in your functionality that is public API's. A user does not want implementation complexity to be described with your exposed class.
So, its better not to expose other observable semantics which are purely implementation details.
It also favor abstraction concept, i.e. show only relevant content, hiding implementation details.





How does inlining increase binary size?
6) Its not useful for embeded system where large binary size is not preferred at all due to memory size constraints.


I want to cover both above qoutes with this post:-
Starting with binary size :-
Too much inlining increase size of code and as result a larger binary file.
For example, if a process has 100 inline functions each of which expands to 100 bytes of executable code and is called in by 100 locations, that's an increase of 1MB.
An who knows this 1MB going to cause problems? this last 1MB could cause the system to "thrash," and that could slow things down.
In other words, if the executable size is too big, the system might spend most of its time going out to disk to fetch the next chunk of code.
This is the notion of code bloat, as described above.

Now coming to embedded system point, most of the embedded system have constraint of memory size. So the first preferences is to keep binary size compatible to it i.e. memory optimization. After which one can apply optimization for performance.

In other words, if the executable size is too big, the system might spend most of its time going out to disk to fetch the next chunk of code.
Again with that? Do you realize just how big the executable would have to be for the OS to have to resort so swap space just to hold its code?
It's not that the executable is placed in swap space; it's that the variables need to be stored there...
First, it's very unlikely that the stack would end up in swap. Second, heap data is unrelated to binary size.
Pages: 12