Authors
Thu Nguyen-Phuoc, Feng Liu, Lei Xiao
Reality Labs Research, Meta
Portals
Summary
Given a neural implicit scene representation trained with multiple views of a scene, SNeRF stylizes the 3D scene to match a reference style. SNeRF works with a variety of scene types (indoor, outdoor, 4D dynamic avatar) and generates novel views with cross-view consistency.
Abstract
This paper presents a stylized novel view synthesis method. Applying state-of-the-art stylization methods to novel views frame by frame often causes jittering artifacts due to the lack of cross-view consistency. Therefore, this paper investigates 3D scene stylization that provides a strong inductive bias for consistent novel view synthesis. Specifically, we adopt the emerging neural radiance fields (NeRF) as our choice of 3D scene representation for their capability to render high-quality novel views for a variety of scenes. However, as rendering a novel view from a NeRF requires a large number of samples, training a stylized NeRF requires a large amount of GPU memory that goes beyond an off-the-shelf GPU capacity. We introduce a new training method to address this problem by alternating the NeRF and stylization optimization steps. Such a method enables us to make full use of our hardware memory capacity to both generate images at higher resolution and adopt more expressive image style transfer methods. Our experiments show that our method produces stylized NeRFs for a wide range of content, including indoor, outdoor and dynamic scenes, and synthesizes high-quality novel views with cross-view consistency.
Contribution
- We introduce a novel style transfer algorithm with neural implicit 3D scene representations, producing high-quality results with cross-view consistency
- We introduce a general, plug-and-play framework, where various implicit scene representations and stylization methods can be plugged in as a sub-module, enabling results on a variety of scenes: indoor scenes, outdoor scenes and 4D dynamic avatars
- We develop a novel training scheme to effectively reduce the GPU memory requirement during training, enabling highresolution results on a single modern GPU
- Through both objective and subjective evaluations, we demonstrate that our method delivers better image and video quality than state-of-the-art methods
Related Works
Image and video style transfer; 3D style transfer; Novel view synthesis
Comparisons
AdaIN, WCT, LST, MCCNet, ReReVST, StyleScene
Overview
We propose an alternating training approach to stylize implicit scene representations. For one iteration: (1) Given a pre-trained scene function, we render images from different views. (2) We then stylize these images using the image stylization module. (3) We train the scene function to match multi-view stylized images similar to training a NeRF function. In the next iteration, (1) we again render images from different views from a now more stylized scene function, (2) perform image stylization on this new set of images, and (3) train NeRF with the new set of images.