I’m most interested to see if the power difference is actually 18% or more because even though the math adds up to 10.2 teraflops. We know that overclocking has diminishing returns on how much gains you get from it. Plus add the heat into the mix, we will have to wait and see I guess.
I feel like it could be even bigger then 20%. If you look at the 5700 xt which a ps5 is basically outside of being rdna 2 (possibly 1.5) and being overclocked. If you watch this video they overclock it to 2150 mhz from 1750 mhz and it only gains 3-5 fps on average per game.
My understanding are: (a) the power draw is constant under load, and does not depend on the thermals, and (b) it uses smart shift to allocate some of CPU power to the GPU. (a) means that it has sufficient cooling to handle what ever the total system power draw / dissipation. When (a) & (b) is taken together, it means that the total power draw allowed must be less than when CPU is running at 3.5GHz and GPU at 2.1+GHz. Also, we know from 5700XT, the power frequency relationship, and Sony confirmed this when talking about lowering frequency by a few percent, lowers power by ~10%. At 3.5GHz, Zen2 will draw about 35-40W. So the main unknow is what the max power dissipation the system is built to handle. But we can speculate. If smart shift will move 15-20W from the CPU to GPU, the CPU will need to operate at Vmin and frequency will be around 2.5GHz or there about. The power management is dynamic, but there are some latency to react and change frequency and voltage on the SoC. Now if you run the CPU close to 3.5, the GPU may operate at ~2GHz, so the difference I expect is larger than 18% (very best case scenario for GPU, and then the CPU is much slower). Take this as me talking through my ass.
Even if you ignore overclocking, simple math... you have a $ or stream buffer, and most of the time the processor can do useful work, but every so often (various reasons) the data is not available and you have to go to memory (with modern SoC can be hundreds to thousands of cycles waiting for the data to arrive). Assume that 5% of your request cause misses. Assume also that your memory system and fabric do not scale at same rate as your processor. So, effectively, we can experiment with increasing latency as we increase frequency (latency is a function of frequency). So, one average, although we are increasing frequency, on average each instruction is taking longer to compute, because the number of cycles when the processor is idle and waiting for data is also increasing.
So why not just decrease memory latency, well that's been the challenge for at least last 30 years that I've been working in this area. For those not familiar, and time to burn this weekend, do search for papers on "hitting the memory wall"
The PS5 is both a smart design and a deceptive one at the same time. It is true that it will run at it’s max clocks most of the time but that is because when you are gaming you aren’t necessarily stressing both the CPU/GPU unless you are in an intense section of the game. So for example it can throttle the CPU during GPU intensive spots or throttle the GPU during CPU intensive spots. Developers can just work around having to max both out at the same exact time basically.
Here’s why it is deceptive. When you think of the “power” of the system you are thinking about what can the system do when every last bit of it is pushed to its limits. Everyone just uses the TF number as a measure of this. Now keep in mind fixed frequency systems like the Xbox are the same TF when doing minimal work as it is when maxed out. The PS5 on the other hand has a variable TF number but they are only telling us the highest one. We don’t know what the lowest TF number is, and that is the number it will technically be when people are thinking “how powerful is this system”.
To be honest, Sony’s TF number is useless without the lower one. How much the PS5 throttles and how close to a full time 10.28TF (fixed clocks) machine it is we may never know. You’ll just have to trust Sony I guess.
It’s a little hard to explain but makes sense once you get it. It’s actually quite clever in a way, but I can also see how it might lead to some difficulties with performance for games. Warning: long post.
Take a “normal” system like PS4 or Xbox Series X. In such a system, the clock speed of the CPU and GPU are both fixed. Now, the clock just tells you how many operations are performed per second. It doesn’t tell you what type of operations those are. And some operations are “harder work”, meaning they generate more heat, than others.
So let’s simplify and say there are “easy” operations we can call E and hard operations H. And let’s say the GPU is doing 1 billion operations per second (1 GHz). If it’s doing 1 billion E operations, then maybe the heat generated is 200 watts (totally made up). But if it’s 500 million E and 500 million H, then the system puts out 250 watts. And if it’s 1 billion H, then the system puts out 300 watts.
Then in that case, you need to design your cooling system for the “worst case scenario” of 300 watts. And you can’t risk making the clock too high. In the example above, turning the clock up to 1.5 GHz would mean you could do 1.5 billion E operations and still only put out 300 watts, which the cooling system can handle. But if you set your fixed clock to 1.5 GHz, then a game that wants to do 1.5 billion H operations will now generate 450 watts. So you have two choices:
Keep the clock lower, at 1GHz. Leave some performance on the table in scenarios where it’s mostly E operations. You could do more given your 300 watt cooling capacity, but if the code changes to be H-heavy then the cooling would be insufficient.
Increase the size of the cooler to 450 watts, which allows you to set the clock to 1.5 GHz. But this is expensive, noisy, and sort of wasteful if you’re mostly handling E code.
For fixed clock systems, it’s all about hitting that right balance of clocking as high as possible while keeping the cooling in check.
PS5 VARIABLE CLOCKS
Ok, so that’s “normal.” The clocks stay the same, but the wattage the system outputs can change based on the type of operations the game is running. So you have to balance the size of the cooler with the clock speed you pick to handle a “worst case scenario.”
With PS5, Sony has flipped this around. The power (watts) are fixed, while the clocks change.
Use the same example from before, and let’s say that Sony has given the PS5 300 watts to work with at all times. The system inspects the kinds of operations it’s being asked to execute. If it’s being asked only to do E operations, then it clocks up to the maximum (here, 1.5GHz) and does 1.5 billion E operations, resulting in 300 watts. On the other hand, if it’s asked to do only H operations, it will downclock to 1 GHz to maintain the 300 watt output, which means it can only do 1 billion operations.
Now obviously all the math is fake, but you see the advantage. Compared to a hypothetical fixed clock system that can handle 300 watts with its cooling system, the PS5 would be faster when performing some mix of E operations and the same when performing only H operations.
In the real world, it won’t be that simple of course. We have no idea how often different types of operations occur, or whether they will be common in some games but not others, and thus how much time the PS5 will spend at its max clock. I believe the quote from Mark Cerny was “at or near maximum clock the majority of the time” which could mean just about anything. And diminishing returns at crazy high clock speeds could absolutely be a thing. And it may be a pain in the ass for developers if they find that their specific thing causes the clocks to tank.
But I don’t really buy that it’s a dumb design just because it has variable clocks. In a nutshell, allowing it to downclock for “H” operations means they can have much higher performance than they otherwise would for “E” operations. This helps them overcome the huge disadvantage the PS5 has in terms of raw compute hardware. 36 CUs vs 52 CUs implies a 44% advantage for the Series X, which Sony will reduce by half or more depending on scenario by using this strategy.
I will be Day One PS5 as well as XSX but I have a bad feeling the PS5 will have terrible return rates with this design and possibly 360 red ring disaster. Not stoking fan boy stuff but how can you have two designs so different. The cooling is a big issues and running at pretty much full spec is going to shorten the life cycle of the console. Just my opinion of course
I just want to know what they mean with predictable performance. How does a dev know predictable how his code performs? Cerny stated a “model gpu”. So I assume frequency is not calculated and adjusted at runtime but before. Does the shader compiler directly output the gpu frequency for a specific program or how does this work?
I’d be surprised if the performance delta isn’t substantial. The bandwidth gap is just too big.
Time will tell of course, but I think we have seen some hints even in how the devs are speaking about them. For example Dirt5 dev when speaking about SX said that right after porting the game at 4k and higher settings than current gen was already hitting close to 120fps and how they were impressed, while when talking about the ps5 it was like yeah, we’ll support 120fps on ps5 too.
You will not understand the cooling system until we actually know what it is. Currently it is all speculation and rumor. What we know is that the SOC is very hard to cool (derived from the size of the console) especially if they really achieve those clock frequencies - which I actually doubt they will achieve that in a sustainable way.