Authors
Abhimitra Meka, Rohit Pandey, Christian Haene, Sergio Orts-Escolano, Peter Barnum, Philip Davidson, Daniel Erickson, Yinda Zhang, Jonathan Taylor, Sofien Bouaziz, Chloe Legendre, Wan-Chun Ma, Ryan Overbeck, Thabo Beeler, Paul Debevec, Shahram Izadi, Christian Theobalt, Christoph Rhemann, Sean Fanello
Google; MPI Informatics; Saarland Informatics Campus
Portals
Summary
Deep Relightable Textures – Our method is able to photo-realistically synthesize and composite dynamic performers under any lighting condition from a desired camera viewpoint. Our framework presents a significant step towards bridging the gap between image-based rendering methods and volumetric videos, enabling exciting possibilities in mixed reality productions.
Abstract
The increasing demand for 3D content in augmented and virtual reality has motivated the development of volumetric performance capture systemsnsuch as the Light Stage. Recent advances are pushing free viewpoint relightable videos of dynamic human performances closer to photorealistic quality. However, despite significant efforts, these sophisticated systems are limited by reconstruction and rendering algorithms which do not fully model complex 3D structures and higher order light transport effects such as global illumination and sub-surface scattering. In this paper, we propose a system that combines traditional geometric pipelines with a neural rendering scheme to generate photorealistic renderings of dynamic performances under desired viewpoint and lighting. Our system leverages deep neural networks that model the classical rendering process to learn implicit features that represent the view-dependent appearance of the subject independent of the geometry layout, allowing for generalization to unseen subject poses and even novel subject identity. Detailed experiments and comparisons demonstrate the efficacy and versatility of our method to generate high-quality results, significantly outperforming the existing state-of-the-art solutions.
Contribution
- A volumetric capture framework that leverages neural rendering to synthesize photorealistic humans from arbitrary viewpoints under desired illumination conditions
- An approach to build neural textures from multi-view images to render the full reflectance field for unseen dynamic performances of humans, including occlusion shadows and an alpha compositing mask. This overcomes the issues of previous works using neural textures that need to be re-trained for every new UV parameterization
- High quality results on free-viewpoint videos with dynamic performers, extensive evaluations and comparisons to show the efficacy of the method and substantial improvements over existing state-of-the-art systems
Related Works
Multi-view 3D Performance Capture; Full body performance capture; Neural Rendering
Overview
We propose a Neural Rendering pipeline for rendering of moving humans in any desired viewpoint and lighting. Features are extracted from the raw images and then pooled in UV space with a learned blending function. 1 × 1 convolutions in texture space allow for generalization to different parameterizations. A final Neural Rendering synthesizes the image in camera space. See text for details.