Upcoming Changes: Hardware Accelerated ASTC Texture Decoding (Patreon)
Content
Hey Patrons! Continuing our promise to keep you updated on our behind the scenes development process, we have more information to share on an upcoming feature. In this mini-update, we are going to take a look at our upcoming Adaptive Scalable Texture Compression (ASTC) GPU Decoder.
To explain to you why we developed this feature, we need to provide you with a bit of additional information on this texture format.
Many Nintendo Switch games utilize ASTC textures (Super Mario Odyssey, Luigi's Mansion 3, Astral Chain, and others). However, some games, more so than others, use much larger texture sizes. Astral Chain for example, commonly uses 4096x4096 textures. As far as texture sizes come, this is enormous. These textures, like all others, must be decoded in order to be rendered. This is where we have noticeably run into issues, especially in games like Astral Chain and Luigi's Mansion 3 that heavily use large ASTC textures. At the moment neither Nvidia or AMDs discrete desktop GPUs have native support for ASTC texture decoding and while Intel iGPUs (UHD 5XX series and newer) do support it, those integrated GPUs lack the raw power, video memory and adequate drivers support to be able to run many emulated Nintendo Switch games.
Since AMD and Nvidia GPUs are the most commonly used, when these ASTC textures are encountered they needed to be decoded on the CPU. For many games (such as Super Mario Odyssey) this is not an issue due to a lower texture size being used. However, in games that utilize larger textures, noticeable stutters and pauses in gameplay occur when this CPU decoding takes place.
To solve this problem we had two options. The first was to make CPU decoding multi-threaded, and the second was to write a custom shader allowing for faster ASTC decoding to take place on the GPU. In testing, we found that in order for multi-threaded CPU decoding to be as fast as our GPU decoded solution, you would require 24+ CPU threads. Since the average user does not have access to a CPU with this many threads, multi-threaded CPU decoding was dropped in favor of our GPU approach.
Offloading this task to the lesser utilized GPU made sense in this regard, plus the added benefit of keeping the texture in GPU memory, rather than needing to download the texture to the CPU, decoding it on the CPU, then uploading the decoded texture back to the GPU.
Our initial work on this GPU decoding is looking very promising, below you can find three examples in the aforementioned titles (Astral Chain and Luigi's Mansion 3). In these tests, you can see through the improved gameplay/audio synchronization, just how much faster our new GPU decoded approach is.
While these are examples of games that see a very measurable improvement, you can expect to see much smoother gameplay in any title that utilizes this ASTC texture format.
This decoder is very close to a release, but we have a few additional bugs to iron out. Keep your eyes peeled for updates.
Furthermore, work is also ongoing on both our recent Texture Cache and Buffer Cache rewrites, as well as the Vulkan/OpenGL improvements teased in our last progress report. When we have more info to share, we will do so as soon as possible. Hopefully you enjoyed this mini-update, and once again thank you for your continued support.
- The yuzu development team