Inlining

Feb 8, 2014 at 5:13pm

Hey guys,

So I wrote a program that is supposed to prove the improved efficiency of inline functions. My original program multiplies two arrays of doubles (one that runs from 100 to 10000, and the other running from 10000 to 100) and calculates the time needed to perform the calculation. I then inlined the function, hoping
to avoid the overhead generated by the calling and returning mechanism, and therefore making the inline function faster.

When I run the both programs, the time difference calculated varies. For example, using the inline, I will get 327 clock cycles for one trial and 311 for the next. This effects the time difference between the two programs, sometimes making the inline slower than the non-inline program. Any suggestions how I can ensure that the inline is faster than the non-inline?

Any help would be greatly appreciated.

//Program 1 - Non-inline
#include <iostream>
#include <ctime>

using namespace std;

void NOT_INL() {
	double arr1[11000];
    double arr2[11000];
    double arr3[11000];

    arr1[0] = 100;
    arr2[0] = 10099;
	int i;
	for(i = 0; i < 9999; i++) {
		arr1[i] = arr1[0] + i; 
		arr2[i] = arr2[0] - i;
		arr3[i] = arr1[i] * arr2[i];
			cout << arr3[i] << endl;
	}
}

int main() {
    double diff;

    clock_t start = clock();
    cout << "start: " << start << endl;
	
    NOT_INL();
    
	clock_t end = clock();
	cout << "end: " << end << endl;
	diff = end - start;							
	cout << "Elapsed time: " << diff << endl;
} ///:~



//Program 2 - Inline
#include <iostream>
#include <ctime>

using namespace std;

inline void INL() {
	double arr1[11000];
    double arr2[11000];
    double arr3[11000];

    arr1[0] = 100;
    arr2[0] = 10099;
	int i;
	for(i = 0; i < 9999; i++) {
		arr1[i] = arr1[0] + i; 
		arr2[i] = arr2[0] - i;
		arr3[i] = arr1[i] * arr2[i];
			cout << arr3[i] << endl;
	}
}

int main() {
    double diff;

    clock_t start = clock();
    cout << "start: " << start << endl;
	
    INL();
    
	clock_t end = clock();
	cout << "end: " << end << endl;
	diff = end - start;
	cout << "Elapsed time: " << diff << endl;
} ///:~

Edit & run on cpp.sh

Feb 8, 2014 at 5:35pm

Smac89 (1727)

See this post:
http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops

Please if this works, post the results here as I am curious to see it

EDIT: In regards to your question about inlining a function, the inline keyword is just a compiler hint to inline the function. Note that the purpose of inlining a function is due to the fact that function calls are expensive. So inlining a function will take the function body and place it where it was called so that this eliminates the need to call the function. However, I would not worry to much about this because since you are only calling this function once, the gain is very insignificant.

Last edited on Feb 8, 2014 at 5:42pm

Feb 8, 2014 at 6:24pm

helios (17607)

What smac89 forgot to mention is that the function you're timing spends nearly all of its time in its loop. Inlining makes sense when a function is called many times. For example, if you were to move the loop to the outside of the function:

//Before:
//Function:
for (i = 0; i < 9999; i++){
    //...
}
//Call:
function();

//After:
//Function:
//...
//Call:
for (i = 0; i < 9999; i++){
    function();
}

(Assume that things have been moved around appropriately for the above to make sense.)

Also, clock() measures intervals of 1/CLOCKS_PER_SEC seconds, not CPU clocks. On Windows, CLOCKS_PER_SEC == 1000. Regardless of that, the function has a limited resolution. If timed values oscillate between, say, 100 and 115, it means one of two things, or a combination of both:
1. The true time is somewhere between those two. The precision limit doesn't allow more accurate timing.
2. Other processes are using CPU time, throwing the measurement off.
#1 can be mitigated by timing longer operations, if function() takes 100 ms plus or minus 15 ms (that's the approximate accuracy of the Windows clock), time 100 runs of the function. You'll never get an oscillation greater than the clock precision, so instead of an error of 15%, you'll get an error of 0.15%.

Feb 8, 2014 at 6:42pm

JLBorges (13770)

Other than for the ODR related rules, the inline keyword is just a hint.

Implementations are free to inline functions that are not specified inline, and are free not to inline functions that are specified as inline.

In practice, as far as inline expansion of functions are concerned, optimizers simply ignore the inline keyword.

Feb 8, 2014 at 7:45pm

Dcull (20)

Sorry if I'm not understanding helios, but you're suggesting I put a for loop of 100 iterations around my INL() and NOT_INL() in my int main(), to call the functions many times because that is when the effects of inlining are observable?
I ran each one for 100 iterations, giving 28423 for non-inline and 28501 for the inline, showing that the inline takes longer?

JLBorges, again, sorry if I misunderstand, but you're saying that the compiler can just ignore my inline keyword all together if it wants to?

Feb 8, 2014 at 8:01pm

JLBorges (13770)

> but you're saying that the compiler can just ignore my inline keyword all together if it wants to?

It can ignore the inline keyword in deciding which functions are to be inlined.

It can't ignore the other rules about inline - for instance, the rule that there can be more than one (identical) definition of an inline function with external linkage (one per translation unit) in the program.

With most compilers, the inlining behaviour can be explicitly controlled by using specicific compiler options and/or pragma directives. For instance, the -fno-default-inline g++ option.

count_bits.cpp

int count_bits2( unsigned int n ) ;

int count_bits( unsigned int n )
{
    if( n == 0 ) return 0 ;
    else return n%2 + count_bits2( n/2 ) ;
}

http://coliru.stacked-crooked.com/a/aa7953dc42bc50f4

count_bits2.cpp

int count_bits( unsigned int n ) ;

int count_bits2( unsigned int n )
{
    if( n == 0 ) return 0 ;
    else return n%2 + count_bits( n/2 ) ;
}

http://coliru.stacked-crooked.com/a/f5f3309f6603e569

main.cpp

#include <iostream>
#include <ctime>

int count_bits( unsigned int n ) ; 

int count_bits_inlined( unsigned int n )
{
    if( n == 0 ) return 0 ;
    else return n%2 + count_bits_inlined( n/2 ) ;
}


int main()
{
    unsigned int MAX = 1024*1024*16 ;
    
    {
        long long cnt = 0 ;
        
        auto begin = clock() ;
        for( unsigned int i = 0 ; i < MAX ; ++i ) cnt += count_bits_inlined(i) ;
        auto end = clock() ;
        
        std::cout << "count: " << cnt << "        inlined: " << double(end-begin) / CLOCKS_PER_SEC << " seconds\n" ;
    }
    
    {
        long long cnt = 0 ;
        
        auto begin = clock() ;
        for( unsigned int i = 0 ; i < MAX ; ++i ) cnt += count_bits(i) ;
        auto end = clock() ;
        
        std::cout << "count: " << cnt << "    not inlined: " << double(end-begin) / CLOCKS_PER_SEC << " seconds\n" ;
    }
}

Edit & run on cpp.sh

http://coliru.stacked-crooked.com/a/6a791e731c05882b

Output:

count: 201326592        inlined: 0.59 seconds
count: 201326592    not inlined: 1.19 seconds

http://coliru.stacked-crooked.com/a/6a791e731c05882b

Last edited on Feb 8, 2014 at 8:55pm

Feb 8, 2014 at 9:17pm

Smac89 (1727)

specifying the inline keyword is never a guarantee that the compiler will inline the function. Even with MSVC++ compilers, the __forceinline keyword provides a stronger hint to the compiler that the function should be inlined. But even this does not guarantee the inlining of a function, they are all still hints.

You can however try the optimization I had posted a link to which makes use of branch optimization and memory allignment which can make the code run faster.

There are also compiler hints for the gcc compiler such as:
__builtin_prefetch(...)

which you can try
http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Other-Builtins.html#Other-Builtins

Topic archived. No new replies allowed.