Authors
Yao Feng, Haiwen Feng, Michael J. Black, Timo Bolkart
Max Planck Institute for Intelligent Systems; Max Planck ETH Center for Learning System
Portals
Summary
This paper presents an approach that regresses 3D face shapes and animatable details that are specific to an individual but change with expression. Once trained, the approach animates a face by combining the reconstructed source identity’s shape, head pose, and detail code, with the reconstructed source expression’s jaw pose and expression parameters to obtain an animated coarse shape and an animated displacement map. Finally, DECA outputs an animated detail shape.
Abstract
While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression. Our model, DECA (Detailed Expression Capture and Animation), is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters, while a regressor is trained to predict detail, shape, albedo, expression, pose and illumination parameters from a single image. To enable this, we introduce a novel detail-consistency loss that disentangles person-specific details from expression-dependent wrinkles. This disentanglement allows us to synthesize realistic person-specific wrinkles by controlling expression parameters while keeping person-specific details unchanged. DECA is learned from in-the-wild images with no paired 3D supervision and achieves state-of-the-art shape reconstruction accuracy on two benchmarks. Qualitative results on in-the-wild data demonstrate DECA's robustness and its ability to disentangle identity- and expression-dependent details enabling animation of reconstructed faces. The model and code are publicly available at https://deca.is.tue.mpg.de.
Contribution
- The first approach to learn an animatable displacement model from in-the-wild images that can synthesize plausible geometric details by varying expression parameters
- A novel detail consistency loss that disentangles identity-dependent and expression-dependent facial details
- Reconstruction of geometric details that is, unlike most competing methods, robust to common occlusions, wide pose variation, and illumination variation. This is enabled by our low-dimensional detail representation, the detail disentanglement, and training from a large dataset of in-the-wild images
- State-of-the-art shape reconstruction accuracy on two different benchmarks
Related Works
Coarse reconstruction; Detail reconstruction; Animatable detail reconstruction
Comparisons
3DMM-CNN, PRNet, RingNet, 3DDFA-V2, MGCNet, Extreme3D
Overview
During training, DECA estimates parameters to reconstruct face shape for each image with the aid of the shape consistency information and, then, learns an expression-conditioned displacement model by leveraging detail consistency information from multiple images of the same individual. While the analysis-by-synthesis pipeline is, by now, standard, the yellow box region contains our key novelty. Once trained, DECA animates a face by combining the reconstructed source identity’s shape, head pose, and detail code, with the reconstructed source expression’s jaw pose and expression parameters to obtain an animated coarse shape and an animated displacement map. Finally, DECA outputs an animated detail shape.