A More Detailed Insight About Sampler Feedback Streaming

This is a explanation post for normal gaming enthusiasts, about the Sampler Feedback Streaming, including these topics:

  • How do textures work in a nutshell
  • History of texture streaming
  • Classic, PRT, PRT+ (SFS)
  • Differences between platforms (traditional PC vs XVA or XVA enabled PCs)

How do textures work in a nutshell

  • Textures needs mipmapping and filtering, to prevent flickering and aliasing.

MipMapping and Texture Filtering:

MipMapping was made for this.

A mipmapped texture asset looks like this:

image

The largest, most detailed one is usually mip0, the smallest one can be anything like mip8 or mip16. There’s no ultimate rule to limit the levels of mipmap, but generally as the numbers go up, the detail level goes down.

MipMapping also gave developers a choice to use lower detailed texture on non-significant things, which can save a huge ton of memory and memory bandwidth.

In completion with MipMapping, texture filtering is also important. Basically Anisotropic filtering is the way to go. But other than that, we also have some other options to choose, for different scenes.

Sampling: Filtering happens in sampling. Sampling is the process to retrieve data from texture, for a pixel.

Imagine you’re watching the world through a tiny hole, the scene behind that hole is the texture asset, and you’re actually doing anisotropic filtering at same time!

(I don’t want to bother you with every tech detail, so the difference of Bilinear, Trilinear and Anisotropic … filtering/sampling won’t be detailed here. You can google that later.)

While doing sampling, the sampler has to find out where it’s sampling, and how much detail it needs.

Before the Sampler Feedback, these information was discarded, which is a waste. Because it has the most accurate information that a texture/LOD streaming engine needs the most.

While that’s about to be changed, we still have to introduce “Texture Streaming” first.

Still catching up? See you soon in the reply area, which details texture streaming and following topics.

20 Likes

Nice ot. Good info

2 Likes
  • History of texture streaming: Classic, PRT, PRT+ (SFS)
    • Classic
    • PRT
    • PRT+SF
    • SFS

Classic Texture Streaming

Let’s start with classic texture streaming, which is the most basic and simple one. As we’ve talked about “mipmapping”, developers now have gained a new set of assets that is at least a half smaller than the original Mip0.

So, for saving the precious memory space, developers start to find out ways to use the high level mip8s (mip8 just for example). Before classic texture streaming, everything in a game level is loaded with mip0. With classic streaming, developers can now use different mip level for different objects, with different ranges or sizes.

Partial Resident Texture or Virtual Texture

PRT is the term used by Unreal Engine, and Virtual Texture is the term used by idTech. But generally they’re the same thing.

As the time moving forward, the mip0 is now larger and larger. We’re seeing 4K and 8K textures now, that can be a huge burden for the memory when loaded in a whole.

So, what about just loading parts of them?

PRT used the same idea of Virtual Memory. We don’t have to load every part of a texture into the memory. We can divide the large texture into small tiles.

image

By dividing the large texture into a tile array, now we can have more fine grained control over the tiles.

For different parts of the texture, some of them can be a part from Mip0, and some of them can be a part from Mip 3 or so.

The MinMip map above, have shown a 8x8 area, requesting for different level of mips.

In this particular example, Every tile has the same memory size. A single tile in Mip1 covers (2^1)^2=4 area size of a Mip0 tile. Thus it’s 4 times less detailed, smaller in general. But still covering the same area size. Likewise, a Mip2 tile covers (2^2)^2=4^2=16 area size, 16 times less detailed and smaller. But still covering the same area size. And the Mip3 tile can cover the whole 64 area size single handedly. Awesome right? But it’s extremely poor quality so we can only use it on the most insignificant part.

Before PRT, we need 64 units of tile memory space to cover that 8x8 area. With PRT, we can now use 1+3+3+1=8 memory space to cover that area. Assuming the mipmap is efficient, that’s a huge save isn’t it?

Well, that’s where the things get tricky: How to make sure the mipmap is efficient?

Before Sampler Feedback, the developers lack the ability to optimize things to the absolutely last drop. They could only make some guesses about visibility, importance or so, but they lack the direct control on things. It’s like you were riding a bike without your hands on the handle, yes you can still control the weight balance and speed using your muscles, but isn’t that shakey?

PRT+(Sampler Feedback)

Time to save the day! With DirectX 12 Ultimate, developers can now get reports from the sampler, and use that report to minimize artifacts, lag spikes and memory wastes! We can finally put our hands back on the bike’s handle now :wink:

Traditional PRT solutions were based on guess,

PRT+(PRT with sampler feedback) is based on hard facts. Because samplers are the real smart end consumers of texture assets, they know what they need (unlike some poor market in other areas of gaming, just kidding LOL). With SF, the streaming engine always only stream needed assets, no waste.

However you do need hardware support for PRT+, you need a modern GPU and SSD at least. And even PRT+ can be refined and optimised. Here we finally goes to the almighty

Sampler Feedback Streaming

SFS is based on PRT+, and PRT+ is based on PRT&Sampler Feedback. SFS it’s a complete solution for texture streaming, containing both hardware and software optimizations.

Firstly, Microsoft built caches for the Residency Map and Request Map, and records the asset requests on the fly. The difference between this method and traditional PRT methods is kinda like, previously you have to check the map but now you have a gps.

Secondly, you need a fast SSD to use PRT+ and squeeze everything available in the RAM. You won’t want to use a HDD with PRT+, because when the asset request emerges, it has to be answered fast (within milliseconds!). The SSD on Xbox is now priotized for game asset streaming, to minimize latency to the last bit.

Thirdly, Microsoft implemented a new method for texture filtering and sharpening on hardware. This is used to smooth the loading transition from mip8 to mip4 or mip0…etc. It’s not magic, but it works like magic:

As we have stated, the Sampler knows what it needs. The developer can answer the request of Mip 0 by giving Mip 0.8 on frame 1, Mip 0.4 on frame 2, and eventually Mip 0 on frame 3.

The fraction part is used on texture filtering, so that the filter can work as intended and present the smoothest transition between LOD changes.

It also allows the storage system to have more time to load assets without showing artifacts.

These hardware based optimizations, combined with PRT+, ultimately combined as what we know as Sampler Feedback Streaming. It’s potential is so wild, just like Mesh Shader and Ray Tracing.

Really can’t wait to see a true next gen game with these capabilities enabled!

:yum: Sierra 117,signing off

11 Likes

Ah, almost forgot about the platform difference thing!

For PCs, PRT+ is the way to go. I think that with a high performance margin on high end PC cards, the mild loss on efficiency without hardware optimization should be OK. While RTX 3070 is capable of 4K computing, sadly Nvidia has killed it’s 4K future by only giving it 8Gigs of VRAM. It’s a bummer on Nvidia’s side, but also a chance for Xbox and AMD ;). And maybe with the wide adoption of PRT+, games can make more use of that 8G VRAM, and make 3070 4K capable again.

For Playstation, due to the complete lack of sampler feedback capability, it has chosen another route, with it’s blazing fast SSD. I think that the traditional PRT solutions should be good enough on PS5.

With the alliance of both Microsoft, AMD and Nvidia, DX12U can finally get a chance to revamp the old pipeline of graphics, built back in 2005. The future is more exciting than anything that has happened in the whole 8th generation.

8 Likes

Can’t wait to see SFS, VRS, and mesh shading used in conjunction.

2 Likes

This is some great stuff! XSX is built with some serious tech

Great OT! This is a true trump card for Xbox future.

Cool. So the seriesS/X and PS5 will be pretty even overall in io/ssd. Its a shame an Xbox hardware fellow did not describe like cerny described the PS5s ssd/io and it’s benefits in game development. I guess they did talk about the velocity architecture a lot, but when explained like in the OP it makes ot clear that ssd/io is just as much of a focus as the PS5.

1 Like

Perhaps we have already seen it (Hellblade 2)

5 Likes

They did showcase a demo for SS that while it was short it was super insightful.

(starts at 2:50)

1 Like

Bold Prediction: XSX I/O will be significantly faster in terms of effective throughput (i.e. how fast unique assets can get into RAM).

How? I bet games with realistic art styles will eventually start using machine learning in XSX to upscale textures at runtime, which could give a x2-x4 boost to effective overall I/O.

2 Likes

Great thread, very informative. Thanks for taking the time to do it.

3 Likes

Claire Andrews did this already in March on the Game Stack channel

(161) DirectX 12 Sampler Feedback | Game Stack Live - YouTube

There is also extensive documentation with source code and everything at Sampler Feedback | DirectX-Specs (microsoft.github.io)

I think MS dive into sampler feedback was deep enough :wink:

2 Likes

@sierra-117

Appreciate the efforts you put into this. Now show us some example shader code … :stuck_out_tongue_winking_eye:

With commentary

Hey friends, don’t post non-verified insiders from other forums as insiders here. Posts will be taken down.

8 Likes

To anyone who’s interested in the working of texture sampling, these slides from Stanford’s computer graphics course are great:

1 Like