Authors
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman
Google Research; Harvard University
Portals
Summary
In this work, we present an extension to mip-NeRF we call “mip-NeRF 360” that is capable of producing realistic renderings of these unbounded scenes. It's designed for real-world scenes with unconstrained camera orientations. Using a novel Kalman-like scene parameterization, an efficient proposal-based coarse-to-fine distillation framework, and a regularizer designed for mipNeRF ray intervals, we are able to synthesize realistic novel views and complex depth maps for challenging unbounded real-world scenes
Abstract
Though neural radiance fields (NeRF) have demonstrated impressive view synthesis results on objects and small bounded regions of space, they struggle on "unbounded" scenes, where the camera may point in any direction and content may exist at any distance. In this setting, existing NeRF-like models often produce blurry or low-resolution renderings (due to the unbalanced detail and scale of nearby and distant objects), are slow to train, and may exhibit artifacts due to the inherent ambiguity of the task of reconstructing a large scene from a small set of images. We present an extension of mip-NeRF (a NeRF variant that addresses sampling and aliasing) that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes. Our model, which we dub "mip-NeRF 360" as we target scenes in which the camera rotates 360 degrees around a point, reduces mean-squared error by 57% compared to mip-NeRF, and is able to produce realistic synthesized views and detailed depth maps for highly intricate, unbounded real-world scenes.
Contribution
- Parameterization. Unbounded 360 degree scenes can occupy an arbitrarily large region of Euclidean space, but mip-NeRF requires that 3D scene coordinates lie in a bounded domain
- Efficiency. Large and detailed scenes require more network capacity, but densely querying a large MLP along each ray during training is expensive
- Ambiguity. The content of unbounded scenes may lie at any distance and will be observed by only a small number of rays, exacerbating the inherent ambiguity of reconstructing 3D content from 2D images
Comparisons
NeRF, mip-NeRF, NeRF++, Deep Blending, Point-Based Neural Rendering, Stable View Synthesis
Overview
We use a “proposal MLP” that emits weights (but not color) that are resampled, and in the final stage we use a “NeRF MLP” to produce weights and colors that result in the rendered image, which we supervise. The proposal MLP is trained to produce proposal weights that are consistent with the NeRF MLP’s w output. By using a small proposal MLP and a large NeRF MLP we obtain a combined model with a high capacity that is still tractable to train.