Authors
Justus Thies, Michael Zollh
Technical University of Munich; Stanford University
Portals
Summary
We present an image synthesis approach that learns object-specific neural textures which can be interpreted by a neural renderer. Our approach can be trained end-to-end with real data, allowing us to re-synthesize novel views of static objects, edit scenes, as well as re-render dynamic animated surfaces.
Abstract
The modern computer graphics pipeline can synthesize images at remarkable visual quality; however, it requires well-defined, high-quality 3D content as input. In this work, we explore the use of imperfect 3D content, for instance, obtained from photo-metric reconstructions with noisy and incomplete surface geometry, while still aiming to produce photo-realistic (re-)renderings. To address this challenging problem, we introduce Deferred Neural Rendering, a new paradigm for image synthesis that combines the traditional graphics pipeline with learnable components. Specifically, we propose Neural Textures, which are learned feature maps that are trained as part of the scene capture process. Similar to traditional textures, neural textures are stored as maps on top of 3D mesh proxies; however, the high-dimensional feature maps contain significantly more information, which can be interpreted by our new deferred neural rendering pipeline. Both neural textures and deferred neural renderer are trained end-to-end, enabling us to synthesize photo-realistic images even when the original 3D content was imperfect. In contrast to traditional, black-box 2D generative neural networks, our 3D representation gives us explicit control over the generated output, and allows for a wide range of application domains. For instance, we can synthesize temporally-consistent video re-renderings of recorded 3D scenes as our representation is inherently embedded in 3D space. This way, neural textures can be utilized to coherently re-render or manipulate existing video content in both static and dynamic environments at real-time rates. We show the effectiveness of our approach in several experiments on novel view synthesis, scene editing, and facial reenactment, and compare to state-of-the-art approaches that leverage the standard graphics pipeline as well as conventional generative neural networks.
Contribution
- Neural Rendering for photo-realistic image synthesis based on imperfect commodity 3D reconstructions at real-time rates
- Neural Textures for novel view synthesis in static scenes and for editing dynamic objects
- which is achieved by an end-to-end learned novel deferred neural rendering pipeline that combines insights from traditional graphics with learnable components
Related Works
Novel-view Synthesis from RGB-D Scans; Image-based Rendering; Light-?eld Rendering; Image Synthesis using Neural Networks; View Synthesis using Neural Networks
Comparisons
Overview
Overview of our neural rendering pipeline: Given an object with a valid uv-map parameterization and an associated Neural Texture map as input, the standard graphics pipeline is used to render a view-dependent screen-space feature map. The screen space feature map is then converted to photo-realistic imagery based on a Deferred Neural Renderer. Our approach is trained end-to-end to find the best renderer and texture map for a given task.