Authors
Sai Bi, Zexiang Xu, Kalyan Sunkavalli, Milo
University of California, San Diego; Adobe Research
Portals
Abstract
We present a deep learning approach to reconstruct scene appearance from unstructured images captured under collocated point lighting. At the heart of Deep Reflectance Volumes is a novel volumetric scene representation consisting of opacity, surface normal and reflectance voxel grids. We present a novel physically-based differentiable volume ray marching framework to render these scene volumes under arbitrary viewpoint and lighting. This allows us to optimize the scene volumes to minimize the error between their rendered images and the captured images. Our method is able to reconstruct real scenes with challenging non-Lambertian reflectance and complex geometry with occlusions and shadowing. Moreover, it accurately generalizes to novel viewpoints and lighting, including non-collocated lighting, rendering photorealistic images that are significantly better than state-of-the-art mesh-based methods. We also show that our learned reflectance volumes are editable, allowing for modifying the materials of the captured scenes.
Contribution
- A practical neural rendering framework that reproduces high-quality geometry and appearance from unstructured mobile phone flash images and enables view synthesis, relighting, and scene editing
- A novel scene appearance representation using opacity, normal and reflectance volumes
- A physically-based differentiable volume rendering approach based on deep priors that can effectively reconstruct the volumes from input flash images
Related Works
Geometry reconstruction; Reflectance acquisition; Relighting and view synthesis
Comparisons
DeepVoxels
Overview
We propose Deep Reflectance Volume representation to capture scene geometry and appearance, where each voxel consists of opacity ?, normal n and reflectance (material coefficients) R. During rendering, we perform ray marching through each pixel and accumulate contributions from each point xs along the ray. Each contribution is calculated using the local normal, reflectance and lighting information. We accumulate opacity from both the camera ?c?s and the light ?l?t to model the light transport loss in both occlusions and shadows. To predict such a volume, we start from an encoding vector, and decode it into a volume using a 3D convolutional neural network; thus the combination of the encoding vector and network weights is the unknown variable being optimized (trained). We train on images captured with collocated camera and light by enforcing a loss function between rendered images and training images.