Authors
Peng Dai, Zhuwen Li, Yinda Zhang, Shuaicheng Liu, Bing Zeng
University of Electronic Science and Technology of China; Nuro Inc.; Google Research
Portals
Abstract
Physically based rendering has been widely used to generate photo-realistic images, which greatly impacts industry by providing appealing rendering, such as for entertainment and augmented reality, and academia by serving large scale high-fidelity synthetic training data for data hungry methods like deep learning. However, physically based rendering heavily relies on ray-tracing, which can be computational expensive in complicated environment and hard to parallelize. In this paper, we propose an end-to-end deep learning based approach to generate physically based rendering efficiently. Our system consists of two stacked neural networks, which effectively simulates the physical behavior of the rendering process and produces photo-realistic images. The first network, namely shading network, is designed to predict the optimal shading image from surface normal, depth and illumination; the second network, namely composition network, learns to combine the predicted shading image with the reflectance to generate the final result. Our approach is inspired by intrinsic image decomposition, and thus it is more physically reasonable to have shading as intermediate supervision. Extensive experiments show that our approach is robust to noise thanks to a modified perceptual loss and even outperforms the physically based rendering systems in complex scenes given a reasonable time budget.
Contribution
- First, we propose a deep learning framework that efficiently generate high quality PBR up to nearly real time
- Second, we empirically find a combination of different layers in perceptual loss to help avoiding artifacts in the result
- Third, our network estimates shading rather than the rendered image directly, which allows the network to tackle the most computational expensive component of the rendering process and focus on generating shading without distraction
- Last, we train a network to combine shading and reflectance for the final color image, which generates higher-quality results comparing to traditional methods
Related Works
Physically Based Rendering; Photo-Realistic Image Generation; Intrinsic Image Decomposition
Comparisons
pix2pix, CAN, CycleGAN, U-Net, Mitsuba, OpenGL
Overview
Given appropriate rendering resources, such as geometry, lights, albedo, etc., many off-the-shelve physically based ren- derers, such as Blender [30], Mitsuba [31] and Maya [32], can produce photo-realistic images that are not distinguishable from real-world photos if there is no running time limit. Our goal is to design a neural network architecture which takes these rendering resources as input and efficiently produces photo-realistic images of sim ilar quality with that from PBR. Fortunately, most of the rendering input source data can be represented in 2D images, which allows us to use the well-known convolutional neural network (CNN) architecture. Specifically in our work, scene geometry is encoded in 2D depth and normal maps; illumination is encoded in two 1- channel panoramic illumination images with distance and intensity values; albedo is encoded in a reflectancemap. Note that all of these sources can be rendered extremely fast through typical rasterization, and the time consumption is neg lectable compared to the PBR itself.