The truth about higher clockspeeds lifting all boats

Xplainin · April 17, 2021, 5:13am

This isn’t a console warring thread. Its to discuss the facts around higher clock speeds and if they increase performance in linear fashion, especially on RDNA GPUs as was sold by Sony for the PS5.

Digital foundry did some experiments using two GPUs. One with a higher clock and lower CU count, and another using a lower clock but higher CU count, both having the same Tflop value. Their results showed better performance by having a lower clock speed and a higher CU count.

Techspot also did a comparison between two Navi cards, one over clocked to 2.1-2.2 ghz, against the stock clock speed of the card. This gave an 18% increase in clock speed over stock cards. The results showed scaling was poor only showing best case scenarios of 10%, with others below that.

This indicates to me that the PS5s high clock speeds isn’t going to give a linear performance profile with its GPU.

This along with the fact that the PS5 has variable clocks when will down clock when graphic intensive games come along is going to give the XSX a larger edge than the on paper 2 tflops indicate.

Together this tends to prove to me that the PS5s high clock speeds were reactionary and the PS5s GPU was proberbly going to be a 2ghz card as shown in github. There is no way that Sony would have planned a console deliberately to use variable clocks and use frequencies that don’t offer linear performance.

One the plus side for Sony however, these clock speeds along with the 36cu GPU which mirrored the PS4 Pro allowed the PS5 to jump out of the blocks with the initial group of games released. The shorter time to triangle that Cerny talked about was achieved for sure.

GavinStevens · April 17, 2021, 9:38am

As I’ve said previously, it depends very much on the engine and how it’s designed. It’s a case of do you want to do more things but slightly slower, or less things but slightly faster. And usually, even more so now, I would opt to do more things slightly slower. Having more CUs available is much better when you won’t even fully utilise a CU anyway, but would want to do more simultaneous tasking.

CallMeCraig · April 17, 2021, 10:55am

that totally depends on the current workload of the GPU and what part of it limits the performance. If you have ALU heavy shader code, increasing the frequency of texture samplers will do nothing to how long the GPU needs to run this code. ALU light code like calculating a shadow map will probably benefit from higher GPU frequencies in a linear fashion because its mostyl bound by the performance of fixed function units in the GPU.

But most of GPU tasks these days are heavy on ALU calculations and/or in case of UE5 circumvent huge parts of the fixed function rasterization pipeline. In these cases more CU units will help more.

pg2g · April 17, 2021, 11:20am

I would imagine most engines are designed to do as much concurrently as possible, right? PC GPUs tend to scale up by adding CUs than boosting clocks.

TechWendigo · April 17, 2021, 12:43pm

Those PC benchmarks are an interesting exercise but ultimately meaningless compared to Sony’s closed platform, however I agree with you about the 36 CU but turbocharged design philosophy in PS5 giving it the early edge in cross generation titles. The evidence is there but no one’s been doing the math on the major channels instead opting for console wars style presentations. I am curious about RT performance though since my understanding was RT performance scales with CU count. I guess it comes down to can Microsoft make inroads with development tools fast enough to push the boat out before Sony puts out a PS5 Pro and employs the same short term strategy to get devs on board. In that case will Microsoft put out a Series XS X Ultimate X+ whatever and is that defeating the purpose of having future looking design like more CU’s at lower clocks.

LucasTaves · April 17, 2021, 1:36pm

It was just a half truth because they are not even lifting the clocks of all gpu parts.

One very important aspect that is the bandwidth wasn’t touched by their overclock.

And also, while it does lift the performance a bit, the performance increase is not linear when the processor itself moved away from a very linear pipeline and multi threaded execution comes into play (and it was the same for PC).

All in all Ps5 is consistently in between a 5700 vs a 5700XT (which just has 4 extra CUs), and in some extreme cases even below a 5700 (likely due having the same bandwidth but having to share it), which shows that simply adding 4 extra CUs on Ps5 design would have result in a stronger, cheaper, cooler and smaller console than pushing for those clocks.

LucasTaves · April 17, 2021, 1:44pm

I’m not sure that even on PS4 that seems way lower level than most (to the point they needed to add stuff to cripple PS5 gpu to match), that the game actively code for clocks or number of CUs.

At least for games coded on xbox ecosystem, we can discard that hypothesis because on BC mode (that runs in gcn mode) SX is delivering 2x the gpu performance o 1X when the only 2x increase was in tflops, which means the games even without being aware at all of the new hardware makes a near perfect utilization of the increased CU count.

Swanlee · April 17, 2021, 1:57pm

All I have to say is the Xbox One had higher GPU AND CPU speeds compared to the PS4 and it did not close the gap or make much difference.

TechWendigo · April 19, 2021, 2:00am

Even by relative percentage scaling the differences here are quite different, PS4 clocks were only 6% slower than XB1 but it had 50% more CU’s and almost doubled the Xbox GPU in other features. In this case the two consoles are very similar down to their ROPs etc only real differentiator is the CU count and custom hardware blocks this time, PS5 has an 18% clock speed advantage but 16 fewer CU’s (31%) with the gap here being wider in clocks it makes sense that we are seeing performance leads early on for PlayStation, the real question here is how long are devs going to target old hardware which benefits from raw clock speeds (PS4 Pro has 36 CUs already) more than additional parallelization. Crysis is a prime example of when old code is designed for the wrong future, granted that game is an extreme case of betting on the wrong horse. Crytek expected CPU clocks to keep increasing instead of going multi core making the game a pain to run at high frame rates to this day.

LucasTaves · April 19, 2021, 10:44am

But esram makes it a non 1:1 scenario.

X1 gpu was severely bandwidth starved if the esram wasn’t used, and most of the 3rd party, specially early on, didn’t used it properly resorting to just: Reduce back the buffer size until most of our work fits inside the esram, instead of juggling around main ram and esram as the system was designed.

I do think there are some games that showcase performance could be close if developers took their time (Rise of Tomb Raider, receiving special treatment on xbone due it being a timed for example, almost matching Ps4 in performance/graphical prowess)