I'm getting close to same numbers (after increasing the repetitions and adding warmup) on gcc, clang, and intel compilers, sometimes valarray is slightly faster, sometimes the loop (intel using GNU library, because using Intel's parallelized valarrays would be cheating) and MSVC (2013) is consistently 2.2 times slower for me.