They do use the CUs, but there is added hardware too to make them more efficient for the job.
The CUs are able to process a single operation involving 2 32bit float point number. Without the extra hardware for performing math with integers (which are smaller) the full CU would also be occupied with only a single int operation.
The extra hardware allows it to perform up to 4 int operations at the same time significantly increasing the throughput. The CUs will still be occupied and unable to perform shader work when that happens, but for a console thatâs actually preferable than dealing with extra space lost with completely separated cores.
Even without rpm the peak int8 and int4 performance would be the same as the single fp precision, so I wouldnât read too much into it not being listed.
Not that it wouldnât matter, but the int4 and int8 hardware isnât to enable the gpu to process integer math, itâs to accelerate them.
Basically each CU is big enough to handle a single operation with 2 32bit floating point numbers. (the 16tflops in this case).
Without rpm the int4 rate would also be 16tops, so they could list that.
What rpm does is add a bit of hardware so the cu can process more numbers at once when the numbers are not big enough to fill the whole registers. So essentially they increase the throughout (but not enable int math) by allowing more operations to be done at the same time.
For int8 you would be able to process up to 4 int operations per cu so 16 X 4 = 64 and for int up to 8 so 128 tops. Whereas for gpus without rpm that would be a constant 16 in all cases.
Tl;Dr the gpu is definitely able to handle int math, they donât listing it does not necessarily mean that they donât support the acceleration for it.
AFAIK int8 would be double of FP16 and int4 would be quadrupel of FP16 in throughput. Depending on the Hotchip slides they talk about a ML inference performance boost from 3x to 10x. You do not get that by just having the same throughput, right?
Not that I know about, Infinity cache is an extra cache to circumvent bandwidth issues from only having 512GB/s max for the PC GPUs. The XSX die shot does not show the same kind of structures and and Infinty Cache is absolutely not needed for the XSS. Infinity cache is also a +128MB cache to normal caches which is completely absent from the XSX SOC which sports way less cache in total including the CPU iirc.
fp32 is half the throughput of fp16. you remember: fp16 is half presision and fp32 is single precision. the 12TF is FP32. 24 TF with FP16. for int8 and int4 you donât have that type of measurement because it happens not to be a âfloating point operationâ.
I think you are being too generous because I will say it is physically impossible when you see how much space the cache is taking up on a 80 CU chip and you also consider the PS5âs APU is significantly smaller than the Series Xâs.
But also i believe there is something up the sleeves of PS5 because everyone keep on telling me that difference between PS5 and XSX will be negligible. And i canât imagine how it is possible unless there is some special secret sauce.
To those people I have already said on here that Sony has already shown their âsecret sauceâ and it is to do with IO, audio and dynamic clocks. Sony has spoken at lengths about the PS5 now, if there was anything else that was interesting they would have said it months ago. This honestly reminds me a bit of the ridiculous things about the One with the dual GPUs etc lol.
PlayStation fans just honestly need to accept that the Series X is just the more capable machine just like Xbox fans did back on 2013.