Authors
Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, Noha Radwan
Max Planck Institute for Intelligent Systems; University of Tubingen; Google Research
Portals
Abstract
Neural Radiance Fields (NeRF) have emerged as a powerful representation for the task of novel view synthesis due to their simplicity and state-of-the-art performance. Though NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available, its performance drops significantly when this number is reduced. We observe that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training. We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints, and annealing the ray sampling space during training. We additionally use a normalizing flow model to regularize the color of unobserved viewpoints. Our model outperforms not only other methods that optimize over a single scene, but in many cases also conditional models that are extensively pre-trained on large multi-view datasets.
Contribution
- A patch-based regularizer for depth maps rendered from unobserved viewpoints, which reduces floating artifacts and improves scene geometry
- A normalizing flow model to regularize the colors predicted at unseen viewpoints by maximizing the loglikelihood of the rendered patches and thereby avoid color shifts between different views
- An annealing strategy for sampling points along the ray, where we first sample scene content within a small range before expanding to the full scene bounds which prevents divergence early during training
Related Works
Neural Representations; Sparse Input Novel-View Synthesis
Overview
NeRF optimizes the reconstruction loss for a given set of input images. For sparse inputs, however, this leads to degenerate solutions. In this work, we propose to sample unobserved views and regularize the geometry and appearance of patches rendered from those views. More specifically, we cast rays through the scene and render patches from unobserved viewpoints for a given radiance field f?. We then regularize appearance by feeding the predicted RGB patches through a trained normalizing flow model ? and maximizing predicted log-likelihood. We regularize geometry by enforcing a smoothness loss on the rendered depth patches. Our approach leads to 3D-consistent representations even for sparse inputs from which realistic novel views can be rendered.