Gamers Nexus break down the Ampere architecture.
- Explains how fp32 units work in Ampere
- Talks about enhancement in RT cores
- How tensor cores performance increase due to sparse matrices
- Gddr6x
And much more…
Link :
He briefly describes how RT performance was being bottelnecked on Turing.
Apparently, the RT cores are theoretically designed to have bounding box intersections and triangle intersections calculations run in parallel. But in practical applications, this wasn’t the case and triangle intersections were causing a bottleneck in Ray Tracing pipeline. This, however is improved with Ampere’s RT core implementation.