Authors
Philipp Henzler, Valentin Deschaintre, Niloy J. Mitra, Tobias Ritschel
University College London; Adobe Research; Imperial College London
Portals
Abstract
We learn a latent space for easy capture, consistent interpolation, and efficient reproduction of visual material appearance. When users provide a photo of a stationary natural material captured under flashlight illumination, first it is converted into a latent material code. Then, in the second step, conditioned on the material code, our method produces an infinite and diverse spatial field of BRDF model parameters (diffuse albedo, normals, roughness, specular albedo) that subsequently allows rendering in complex scenes and illuminations, matching the appearance of the input photograph. Technically, we jointly embed all flash images into a latent space using a convolutional encoder, and -- conditioned on these latent codes -- convert random spatial fields into fields of BRDF parameters using a convolutional neural network (CNN). We condition these BRDF parameters to match the visual characteristics (statistics and spectra of visual features) of the input under matching light. A user study compares our approach favorably to previous work, even those with access to BRDF supervision.
Contribution
- a generative model of a BRDF material texture space
- generation of maps that are diverse over the infinite plane
- a flash image dataset of materials enabling our training with no BRDF parameter supervision or synthetic data
Related Works
Textures in Graphics; Material Modeling
Overview
Starting from an exemplar (top-left) our trained encoder encodes the image to a compact latent space variable ?. Additionally, a random infinite field is cropped with the same spatial dimensions as the flash input image. The noise crop is then reshaped based on a convolutional U-Net architecture. Each convolution in the network is followed by an Adaptive Instance Normalization (AdaIN) layer [Huang and Belongie 2017] reshaping the statistics (mean ? and standard deviation ?) of features. A learned affine transformation ? -s per layer maps ? to the desired ?-s and ?-s. The output of the network are the diffuse, specular, roughness, normal parameters of an svBRDF that, when rendered using a camera colocated flash light, look the same as the input. Our unsupervised setting allows us to fine-tune our trained network on materials to acquire.