Authors
Jon Hasselgren, Nikolai Hofmann, Jacob Munkberg
NVIDIA
Portals
Summary
We learn topology, materials, and environment map lighting jointly from 2D supervision. We directly optimize topology of a triangle mesh, learn materials through volumetric texturing, and leverage Monte Carlo rendering and denoising. Our output representation is a triangle mesh with spatially varying 2D textures and a high dynamic range environment map, which can be used unmodified in standard game engines. Knob model by Yasutoshi Mori, adapted by Morgan McGuire.
Abstract
Recent advances in differentiable rendering have enabled high-quality reconstruction of 3D scenes from multi-view images. Most methods rely on simple rendering algorithms: pre-filtered direct lighting or learned representations of irradiance. We show that a more realistic shading model, incorporating ray tracing and Monte Carlo integration, substantially improves decomposition into shape, materials & lighting. Unfortunately, Monte Carlo integration provides estimates with significant noise, even at large sample counts, which makes gradient-based inverse rendering very challenging. To address this, we incorporate multiple importance sampling and denoising in a novel inverse rendering pipeline. This substantially improves convergence and enables gradient-based optimization at low sample counts. We present an efficient method to jointly reconstruct geometry (explicit triangle meshes), materials, and lighting, which substantially improves material and light separation compared to previous work. We argue that denoising can become an integral part of high quality inverse rendering pipelines.
Related Works
Neural methods for multi-view reconstruction; BRDF and lighting estimation; Image denoisers
Comparisons
Overview
We extend NVDIFFREC with a differentiable Monte Carlo renderer for direct illumination. Additionally, to reduce variance, we add a differentiable denoiser. These novel steps are highlighted in green. Following NVDIFFREC, the topology is parameterized using an SDF, and a triangular surface mesh is extracted in each iteration using DMTet, combined with spatially-varying PBR materials and HDR environment lighting. The system is supervised using only photometric loss on the rendered, denoised image compared to a reference, and gradients are back-propagated to the denoiser, shape, materials, and lighting parameters. All parameters are optimized jointly.