I am trying to understand the differences between processor time (CPU time) and the real time (Wall clock time). I found a very good example here: http://en.cppreference.com/w/cpp/chrono/c/clock
#include <iostream>
#include <iomanip>
#include <chrono>
#include <ctime>
#include <thread>
// the function f() does some time-consuming work
void f()
{
volatiledouble d = 0;
for(int n=0; n<10000; ++n)
for(int m=0; m<10000; ++m)
d += d*n*m;
}
int main()
{
std::clock_t c_start = std::clock();
auto t_start = std::chrono::high_resolution_clock::now();
std::thread t1(f);
std::thread t2(f); // f() is called on two threads
t1.join();
t2.join();
std::clock_t c_end = std::clock();
auto t_end = std::chrono::high_resolution_clock::now();
std::cout << std::fixed << std::setprecision(2) << "CPU time used: "
<< 1000.0 * (c_end-c_start) / CLOCKS_PER_SEC << " ms\n"
<< "Wall clock time passed: "
<< std::chrono::duration<double, std::milli>(t_end-t_start).count()
<< " ms\n";
}
At one point we do an invocation of a thread where the function f() is passed as an input argument: std::thread t1(f)
In the function f() a double variable with the volatile identifier is declared: volatiledouble d = 0;
But overall... when we we use the qualifier "volatile" we say the compiler not to optimize the source code where the variable is used. Because, for example, an exterior interface could change the value of the variable.
However for me this is not the case in this example... So I don't understand why "d" has the qualifier volatile in this example. Does anybody know?
Thanks.
> when we we use the qualifier "volatile" we say the compiler not to optimize the code where the variable is used.
No. Operations on volatile qualified variables can be optimised.
For example, certain forms of strength reduction would still be possible
(replacing integer division by a power of 2 with a cheaper shift instruction).
Reads from and writes to volatile objects are part of the observable behaviour of the program.
This means that, with volatile int i = 7 ;
the sequence of statements ++i ; ++i ; ++i ; ++i can't be rewritten as i += 4 ;
and for( int j = 0 ; j < 10 ; ++j ) i += j ; can't be rewritten as i += 45 ;
However for me this is not the case in this example... So I don't understand why "d" has the qualifier volatile in this example. Does anybody know?
The function is being used by 2 threads, I am guessing that is a big factor. Not sure what happens if the calls to it are interleaved, would the value of d go crazy? Does d have a separate representation in each thread, or does volatile enforce that somehow? I am guessing it doesn't. There is no mutex to prevent different threads running that code.
So I don't know, just guessing and putting ideas out, as per my user tag :+)
What kind of compile optimization is block when using the keyword volatile?
What volatile does precisely (with respect to accessing a volatile glvalue) is up to the implementation. As JLBorges indicates, in general, it means that accesses can't be re-ordered or removed; each access is considered a side-effect for the purposes of the optimizer.
The volatile keyword serves to prevent the accesses to d being combined, removed, or reordered, or more likely to prevent the function body from being optimized out entirely.
However even with volatile access, other kinds of optimisations are still possible; for example the LLVM compiler unrolls the inner loop, performing five increments per each of the 200 iterations. https://godbolt.org/g/jBWKN6
This is an example of optimisation (strength reduction) on an operation on a volatile variable that was mentioned earlier: division is replaced with a cheaper arithmetic shift, even for the volatile object.
1 2 3
void bar_with_volatile( volatileunsignedint& v ) { v /= 64 ; }
void bar_without_volatile( unsignedint& v ) { v /= 64 ; }
consult the vastly superior (light years ahead) JLBorges et al. for non guesses :+)
I am astounded
@JLBorges Thanks for this example:
1 2 3 4 5 6 7 8 9 10 11
int foo()
{
int n = 1'000 ;
int v = 0 ;
for( int i = 0 ; i < n ; ++i )
for( int j = 0 ; j < n ; ++j )
++v ;
return v ;
}
vs
1 2 3 4 5 6 7 8 9 10 11
int foo_with_volatile_var()
{
int n = 1'000 ;
volatile int v = 0 ;
for( int i = 0 ; i < n ; ++i )
for( int j = 0 ; j < n ; ++j )
++v ;
return v ;
}
I got it!. So in the example they want to use the function "f()" to spend son time.... In order that the time is considerable long they do the variable "d" volatile. Basically apply optimization in the access of this variable (
combined, removed, or reordered
) to avoid any shortcut by the compiler in the loop iterations.
Regarding the second example:
I see that in the link to the editor you sent me it shows also the assembly language behind:
1 2
shr dword ptr [rdi], 6
ret
But I don't understand what do you mean with
cheaper arithmetic shift
@Cubbi: Thanks for the obs. I will change it... To many tabs opened and a fast copy-paste..
> But I don't understand what do you mean with cheaper arithmetic shift
Assume that on a particular hardware platform, for (unsigned integer) values in registers, an integer divide instruction takes 7 clock cycles and a bit-wise shift instruction takes 2 clock cycles. This is typical; on most platforms a divide instruction is more expensive (takes more time) than a shift instruction. Here, if the compiler can replace a divide instruction with a shift instruction that produces an equivalent result, the generated code would execute faster. https://en.wikipedia.org/wiki/Division_by_two#Binary
---- @JLBorges ----
I understand that one shift to the right of the binary would divide this number between two. So I read in the link you provided.
I also read the assembly command for shifting to the right and it is
shr
.
In the example you provided:
1 2
void bar_with_volatile( volatileunsignedint& v ) { v /= 64 ; }
void bar_without_volatile( unsignedint& v ) { v /= 64 ; }
They are actually executing this shift to the right:
1 2
shr dword ptr [rdi], 6
ret
Unfortunately I do not understand the rest of the line:
ptr [rdi], 6
ret
So I think I am missing something about what you say:
if the compiler can replace a divide instruction with a shift instruction
----
I would like also to point out what @TheIdeasMan said:
Not sure what happens if the calls to it are interleaved, would the value of d go crazy? Does d have a separate representation in each thread, or does volatile enforce that somehow? I am guessing it doesn't. There is no mutex to prevent different threads running that code
In the example there is no mutex. So the variable "d" is accessed by both threads and the value should be at first sight modified by both threads. However the program seems to execute correctly.
Why does the variable "d" not become crazy?
> Unfortunately I do not understand the rest of the line:
To understand the rest of the line, you would need to learn about how arguments are passed to a function in the x86-64 architecture, and how a typical compiler implements passing references to objects to a function.
To understand that a shift instruction is used for the integer divide operation, the knowledge that shr is the shift right instruction is sufficient.
> So the variable "d" is accessed by both threads
No. The object has automatic storage duration; the two threads operate on two different objects.