Modern CPUs are pretty amazing

Related: https://cplusplus.com/forum/lounge/47826/
(You don't need to read it to understand this post.)

I spent Sunday playing around with OpenCV and comparing CUDA and CPU performance at feature extraction. I had done this years ago on a Core 2 Duo and a GeForce 9500 GT. Back then the GPU was easily six times faster than the CPU at the overall task. With the same overall task on my current machine (3950X + RTX 3070), the GPU is not even twice as fast as the CPU! Starting from a set of 500 image files stored on a ramdisk, loading each one, computing SURF descriptors, and storing the results in a vector, the CPU completes in 33 seconds and the GPU does it in 21 seconds. Measuring only the actual processing and excluding the load times reveals that although the CPU has 64% of the total throughput of the GPU, the single thread latency is actually about 50 times larger. In other words, running the same test in a single thread would mean the CPU would take 924 seconds.
Another interesting point to note is AMD's SMT scaling. The CPU only has 16 cores, but running 32 threads the task's throughput scales 28 times, so each core is performing 75% extra work pretty much for free.

I might see if I can rent a few hours on an EPYC or Threadripper machine and see if I can beat the GPU by just throwing enough cores at the problem. A coworker also mentioned that modern consumer Nvidia chips have gimped CUDA cores, so it would also be interesting to see into a shader reimplementation, but that's much more work than I'm willing to put in right now.
Last edited on
There's been a lot of performance gains the past few generations of CPUs/GPUs. I'm waiting for generational improvement to plateau so I can upgrade without being upset that the next gen is even better.

I'm not as ancient as you guys, but I still get pretty impressed seeing how far computers have come from when I was little - playing Halo Combat Evolved on my mediocre laptop.
I can remember when a single core 1GHz CPU was considered amazing and blazing fast in comparison to what had come before it.
I'm waiting for generational improvement to plateau so I can upgrade without being upset that the next gen is even better.
That's probably going to take at least 10-20 years, but nowadays performance generation-to-generation is pretty stagnant. When Ryzen came out I upgraded to a 1700 from an Intel 3570 (the best non-K i5 of the Ivy Bridges; pretty much the sweet spot of bang-for-buck) and I could probably still use it today. There hasn't been that much improvement in single thread performance. Maybe +50% in 9 years?
I can remember when a single core 1GHz CPU was considered amazing and blazing fast in comparison to what had come before it.


heh, yes... I have the manual for my 386 explaining just how amazingly powerful it is and what all I can now do...
the leap from 16 to 40+ mhz and moving the math onto the cpu felt so big back then, and the pentium with its 1.5 processors was just a revolution.
Last edited on
That's probably going to take at least 10-20 years, but nowadays performance generation-to-generation is pretty stagnant.

Performance gains have been going through the roof from what I'm seeing. Youtuber benchmarks keep showing pretty sweet increases.

nVidia Graphics cards made a big splash going to the 30 series and their 40 series now is too.

When AMD and Apple made waves with their powerful new CPUs, I hadn't seen that kind of hype around performance in a while. Then Intel came slamming back with 12th gen.

Meanwhile, I had a 7th gen Intel CPU and don't think there was a reason to upgrade from 7th gen until these newer 12th gen and Ryzen chips.


For a while there was no reason to upgrade, but now there's been substantial improvement 2 generations of CPUs/GPUs in a row.
Note that I was talking about single thread performance, because that's what determines the worst case performance of a computer.

Performance gains have been going through the roof from what I'm seeing.
Only if you look at multithread performance. Single thread performance has only been inching along. And that makes sense, because getting more multithread performance is just a matter of getting enough surface to cram your cores in, or transistors small enough to make them fit in the existing surface.

nVidia Graphics cards made a big splash going to the 30 series and their 40 series now is too.
GPUs are massively parallel architectures, so also not included by my previous statement. That said, the 40 series, or at least the 4090, seems to be about just increasing the power at the cost of everything else in the card (performance/Watt, size, price, overall practicality, etc.). For all this, it doesn't offer incredible performance gains, IMO.

I wouldn't describe GPU performance gains generation-to-generation as "stagnant", but I also wouldn't say the 30 series is an outlier in terms of improvements compared to its predecessors. If anything, the 20 series was more revolutionary for introducing raytracing (although I'm still not convinced that's not more than a gimmick to sell hardware).

When AMD and Apple made waves with their powerful new CPUs, I hadn't seen that kind of hype around performance in a while.
Ryzen's big deal wasn't really performance, it was core count. Intel had been sleeping in the laurels for years on that front. The original Ryzen 7s had mediocre single thread performance, but shone if you had a very heavily parallel and branchy task, such as compilation.

Apple's case is different, because they're working on an entirely different architecture than Intel or AMD. They can squeeze out more MIPS per Watt. However, just by looking at their dies they obviously have a pretty crazy number of transistors. It's not too difficult to get more performance by just throwing more transistors at the problem, if you don't care about optimizing yields.
Last edited on
Only if you look at multithread performance.

I see, you're looking at physical improvements rather than architectural/logic improvements which try to reason their way to better performance.

Intel still hasn't adopted smaller nanometer CPUs have they?

If anything, the 20 series was more revolutionary for introducing raytracing (although I'm still not convinced that's not more than a gimmick to sell hardware).

Raytracing looks pretty nice, and there are people remastering older games like Portal to have that beautiful raytracing.
Intel still hasn't adopted smaller nanometer CPUs have they?
Well, that's unclear, because they no longer refer to node processes by the feature size. I always thought it was sort of a bullshit value anyway, especially since a TSMC nanometer is not the same as an Intel nanometer or a Samsung nanometer. A more reliable measure would be sqrt(transistors/die_area).

Raytracing looks pretty nice, and there are people remastering older games like Portal to have that beautiful raytracing.
Eh. It looks okay, but it doesn't blow me away, and for how much power and performance it costs, it really should.
I always thought it was sort of a bullshit value anyway, especially since a TSMC nanometer is not the same as an Intel nanometer or a Samsung nanometer

From what I remember reading, Intel nanometer *was* accurate and it was TSMC that wasn't - could be wrong though. Some independent group measured TSMC nanometer size.

Here's a similar article, but not the same people I read before. This one just compares the two CPU photos, but the other one I read actually measured I believe.

Could be wrong though, been a while since I read it.

https://hexus.net/tech/news/cpu/145645-intel-14nm-amdtsmc-7nm-transistors-micro-compared/

Eh. It looks okay, but it doesn't blow me away, and for how much power and performance it costs, it really should.

Yea, it's not mind blowing, but it can look pretty nice depending on the game and specific scenes. The real beauty comes from having all these graphics settings together. Turn off just one beauty setting and a game will probably look 90% the same, turn them all off and it'll look ugly.
Topic archived. No new replies allowed.