I was writing a function to measure CPU speed when I noticed something extremely odd:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
double measure_ratio(int divisions, int iterations);
int main(){
auto t0 = clock();
auto result = measure_ratio(10000, 100);
//auto result = measure_ratio(14142, 100);
auto t1 = clock();
std::cout << t0 << std::endl
<< t1 << std::endl;
std::cout <<
"Density: " << result << "\n""Time: " << (t1 - t0) / (double)CLOCKS_PER_SEC << " s\n";
return 0;
}
Output:
8
8
Density: 0.096656
Time: 0 s
This program takes ~4 seconds to finish (measured with a stopwatch). If I increase the divisions to 14142 the output is basically the same (time is still 0 s), but the program then takes ~8 seconds to finish.
Given say this, r = f1() + f2() * f3();
it is unspecified which order the functions are called in.
The only dependency is in how the results are combined.
It's perfectly reasonable to call f1() first and store the result in a register, even though it will be involved last in the full expression.
But in this case, the compiler can also see that measure_ratio() doesn't depend on t0, and the second call to clock() doesn't depend on result.
1 2 3
auto t0 = clock();
auto result = measure_ratio(10000, 100);
auto t1 = clock();
As with the single statement expression, the dependencies are only resolved at the cout statement, at which point t0,t1,result must have been evaluated (which they have been).
The compiler certainly doesn't know about the special significance of the temporal nature of the two calls to clock().
Yes, I ended up solving it with volatile.
It just surprised me that I had never encountered this behavior. It also seemed odd to me that the compiler would choose reorder a call between two calls with outside-observable behavior.
Question: What counts as behavior observable from outside for the purposes of this discussion? Obviously I/O, but what about mutex locking, atomic operations, etc.?
The least requirements on a conforming implementation are:
. Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
. At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
. The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program. [ Note: More stringent correspondences between abstract and actual semantics may be defined by each implementation. — end note ] http://eel.is/c++draft/intro.abstract#6
class C{
public:
double r;
double i;
C(double r = 0, double i = 0): r(r), i(i){}
C(const C &) = default;
C(C &&) = default;
C &operator=(const C &) = default;
C &operator=(C &&) = default;
double abs_sq() const{
return r * r + i * i;
}
double abs() const{
return sqrt(abs_sq());
}
const C &operator+=(const C &other){
r += other.r;
i += other.i;
return *this;
}
C operator+(const C &other) const{
auto ret = *this;
ret += other;
return ret;
}
const C &operator-=(const C &other){
r -= other.r;
i -= other.i;
return *this;
}
C operator-(const C &other) const{
auto ret = *this;
ret -= other;
return ret;
}
C operator-() const{
return {-r, -i};
}
C operator*(const C &other) const{
return {r * other.r - i * other.i, r * other.i + i * other.r};
}
const C &operator*=(const C &other){
*this = *this * other;
return *this;
}
};
bool is_in_set(const C &c, int iterations){
C z = c;
for (int i = 1; i < iterations; i++){
if (z.abs_sq() > 2 * 2)
returnfalse;
z = z * z + c;
}
return z.abs_sq() <= 2*2;
}
double measure_ratio(int divisions, int iterations){
std::uint64_t in = 0;
std::uint64_t total = divisions;
total *= total;
double increment = 4.0 / divisions;
for (int iy = 0; iy < divisions; iy++){
double y = iy * increment - 2;
for (int ix = 0; ix < divisions; ix++){
double x = ix * increment - 2;
if (is_in_set({x, y}, iterations))
in++;
}
}
return (double)in / (double)total;
}
Incidentally using measure_ratio(14142, 100) instead of measure_ratio(10000, 100) has these results:
0
10485
Density: 0.0966602
Time: 10.485 s
0
10500
Density: 0.0966602
Time: 10.5 s
Windows 10 Pro 64-bit, Intel Core i5 2400 @ 3.10GHz, 12.0GB Dual-Channel DDR3 @ 532MHz, compiled as 32-bit. Compiling for 64-bit the results are marginally slower.
Oh! I thought you were saying that 14142 and 10000 were taking the same amount of time! Now that would have been weird.
It's strange that you say that x86-64 runs slower than x86. I tested on a Ryzen and on a Haswell and on both the x86-64 build was substantially faster. 50% faster on the Ryzen and 37% faster on the Haswell. I also have a Nehalem, but it's a Linux box and I don't have any x86 VMs set up on it, so I could only test an x86-64 version.
I was a bit surprised that 64-bit code ran a bit slower than 32-bit on 64-bit Windows 10, no matter how many divisions were calculated.
For instance, 64-bit execution speed for 14142 divisions was about 11.6 seconds vs. 10.5 for 32-bit. 10000 divisions was similarly marginally slower when run as 64-bit: 5.7/5.8 vs. 5.25.
But then the machine I ran the code on is a bit of a Frankenstein, having been upgraded from 64-bit Win 7 to Win 10, with lots of hardware changes over the years. I don't know what effect that could have, and frankly don't really care,
It also seemed odd to me that the compiler would choose reorder a call between two calls with outside-observable behavior.
What outside observable behavior? Clock() is probably inline so the compiler knows what it does and determines that it has no outside observable behavior other than the return value.
Along those lines, you could also probably get around this by using another function in a different source file:
file1.cpp:
1 2 3 4
clock_t myClock()
{
return clock();
}
main.cpp:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
double measure_ratio(int divisions, int iterations);
// Compiler has no idea what this does while compiling main.cpp. For all it knows,
// myClock() launches a Mars probe and operates it for 18 years.
extern clock_t myClock();
int main(){
auto t0 = myClock();
auto result = measure_ratio(10000, 100);
auto t1 = myClock();
std::cout << t0 << std::endl
<< t1 << std::endl;
std::cout <<
"Density: " << result << "\n""Time: " << (t1 - t0) / (double)CLOCKS_PER_SEC << " s\n";
return 0;
}
Nope, not in the implementation I'm using. It's imported from a DLL.
Even if if wasn't, extracting the call out to a different translation unit should not help, since the compiler can still tell that t1 doesn't depend on the computation of result.
If it's reading from some memory location that hasn't been declared volatile then it might conclude that that there is no observable behavior. Of course all of this is speculation. It would depend on the specifics of the call to clock().
Nope, not in the implementation I'm using. It's imported from a DLL.
Well damn that's strange then! It sounds like a bug to me.
Even if if wasn't, extracting the call out to a different translation unit should not help, since the compiler can still tell that t1 doesn't depend on the computation of result.
But it doesn't know what side effects the function has so it must call the function in the order given.
Hmm. Maybe the issue is actually in measure_ratio(). If the compiler can determine that measure_ratio() has no side effects then it might decide to move it. Put another way, maybe the compiler moved the call to measure_ratio() down, rather than moving the call to clock() up.
Even if if wasn't, extracting the call out to a different translation unit should not help, since the compiler can still tell that t1 doesn't depend on the computation of result.
But it doesn't know what side effects the function has so it must call the function in the order given.
That was my fault when I wrote the OP. measure_ratio() is in the same translation unit, I just didn't want to add unnecessary details.
Put another way, maybe the compiler moved the call to measure_ratio() down, rather than moving the call to clock() up
Yes, exactly. Both clock() calls happened in the correct order (at least AFAICT). It's the measure_ratio() call that was moved to a later point. Basically it rewrote it to
This is a historied issue with C++, and remains for valid C++ reasons. For both some context and a solution, check out Chandler Carruth’s answer at SO: