FuseSR: Super Resolution for Real-time Rendering through Efficient Multi-resolution Fusion

[Zhihua Zhong1,4, Jingsen Zhu1]Co-Aurthors, Yuxin Dai3, Chuankun Zheng1, Yuchi Huo2,1, Guanlin Chen4, Hujun Bao1, Rui Wang1
1State Key Lab of CAD&CG, Zhejiang University, 2Zhejiang Lab, 3Zhejiang A\&F University, 4Zhejiang University City College,

SIGGRAPH Asia 2023

Paper arXiv Supplementary Video Code
Input Ours GT NSRR

8x8(64x) super-resolution result in KITE.


The workload of real-time rendering is steeply increasing as the demand for high resolution, high refresh rates, and high realism rises, overwhelming most graphics cards. To mitigate this problem, one of the most popular solutions is to render images at a low resolution to reduce rendering overhead, and then manage to accurately upsample the low-resolution rendered image to the target resolution, a.k.a. super-resolution techniques. Most existing methods focus on exploiting information from low-resolution inputs, such as historical frames. The absence of high frequency details in those LR inputs makes them hard to recover fine details in their high-resolution predictions. With LR images and HR G-buffers as input, the network requires to align and fuse features at multi resolution levels. We introduce an efficient and effective H-Net architecture to solve this problem and significantly reduce rendering overhead without noticeable quality deterioration. Experiments show that our method is able to produce temporally consistent reconstructions in 4 × 4 and even challenging 8 × 8 upsampling cases at 4K resolution with real-time performance, with substantially improved quality and significant performance boost compared to existing works.

Multi-resolution Alignment

To get the utmost out of HR auxilary features, we propuse H-Net that align multi-resolution data at a homologous low resolution screen space. Fusing multi-resolution features into a low resolution feature can not only aggregate all the data with the same screen space coordinate, but also compress into a low resolution form. so H-Net take advantages in both quality and speed.

BRDF pre-integrate demodulation

LR RGB space LR demodulated irradiance HR pre-integrated BRDF
LR RGB space LR demodulated irradiance HR pre-integrated BRDF

We further improve the quailty using demodulation. With demodulation, the inferred target of neural network change from high frequency RGB color to low frequency irradiance term.




Kite 32.3331.2227.7428.0029.1228.3030.2125.0025.72
Showdown 36.3231.4230.2729.1726.2929.3133.6129.1725.62
Slay 37.0234.4135.4235.3932.3934.9434.2632.1233.47
City 28.9428.6627.6528.2326.5627.1527.2025.9526.46


Kite 0.9330.9000.8320.8290.8870.8930.8990.7650.770
Showdown 0.9760.9490.9450.9140.8660.9170.9550.9140.813
Slay 0.9720.9580.9620.9630.9280.9440.9570.9390.943
City 0.9210.9010.8990.8960.8360.8880.9160.8730.873

FuseSR outperform SOTA method in both quality and speed.

Temporal Result


        title={FuseSR: Super Resolution for Real-time Rendering through Efficient Multi-resolution Fusion},
        author={Zhong, Zhihua and Zhu, Jingsen and Dai, Yuxin and Zheng, Chuankun and Chen, Guanlin and Huo, Yuchi and Bao, Hujun and Wang, Rui},
        booktitle={SIGGRAPH Asia 2023 Conference Papers},