Authors
Huan Wang, Jian Ren, Zeng Huang, Kyle Olszewski, Menglei Chai, Yun Fu, Sergey Tulyakov
Snap Inc.; Northeastern University
Portals
Summary
We present R2L, a deep (88-layer) residual MLP network that can represent the neural light field (NeLF) of complex synthetic and real-world scenes. It is featured by compact representation size (~20MB storage size), fast rendering speed (~30x speedup than NeRF), significantly improved visual quality (1.4dB boost than NeRF), with no whistles and bells (no special data structure or parallelism required).
Abstract
Recent research explosion on Neural Radiance Field (NeRF) shows the encouraging potential to represent complex scenes with neural networks. One major drawback of NeRF is its prohibitive inference time: Rendering a single pixel requires querying the NeRF network hundreds of times. To resolve it, existing efforts mainly attempt to reduce the number of required sampled points. However, the problem of iterative sampling still exists. On the other hand, Neural Light Field (NeLF) presents a more straightforward representation over NeRF in novel view synthesis -- the rendering of a pixel amounts to one single forward pass without ray-marching. In this work, we present a deep residual MLP network (88 layers) to effectively learn the light field. We show the key to successfully learning such a deep NeLF network is to have sufficient data, for which we transfer the knowledge from a pre-trained NeRF model via data distillation. Extensive experiments on both synthetic and real-world scenes show the merits of our method over other counterpart algorithms. On the synthetic scenes, we achieve 26-35x FLOPs reduction (per camera ray) and 28-31x runtime speedup, meanwhile delivering significantly better (1.4-2.8 dB average PSNR improvement) rendering quality than NeRF without any customized parallelism requirement.
Contribution
- Methodologically, we present a brand-new deep residual MLP network aiming for compact neural representation, fast rendering, without extra demand besides 2D images, for efficient novel view synthesis. This is the first attempt to improve the rendering efficiency via network architecture optimization
- Our network represents complex real-world scenes as neural light fields. To resolve the data shortage problem when training the proposed deep MLP network, we propose an effective training strategy by distilling knowledge from a pre-trained NeRF model, which is the key to enabling our method
- Practically, our approach achieves 26 ? 35× FLOPs reduction (28 ? 31× wall-time speedup) over the original NeRF with even better visual quality, which also performs favorably against existing counterpart approaches
Related Works
Efficient neural scene representation and rendering; Neural light field (NeLF); Knowledge distillation (KD)
Comparisons
NeRF, DONeRF, NSVF, NeX, AutoInt, X-Fields, RSEN, KiloNeRF