Authors
Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang
University of California, Merced; Adobe Research
Portals
Summary
In this paper, we propose an effective object completion algorithm using a deep generative model. Given a masked image, our goal is to synthesize the missing contents that are both semantically consistent with the whole object and visually realistic.
Abstract
In this paper, we propose an effective face completion algorithm using a deep generative model. Different from well-studied background completion, the face completion task is more challenging as it often requires to generate semantically new pixels for the missing key components (e.g., eyes and mouths) that contain large appearance variations. Unlike existing nonparametric algorithms that search for patches to synthesize, our algorithm directly generates contents for missing regions based on a neural network. The model is trained with a combination of a reconstruction loss, two adversarial losses and a semantic parsing loss, which ensures pixel faithfulness and local-global contents consistency. With extensive experimental results, we demonstrate qualitatively and quantitatively that our model is able to deal with a large area of missing pixels in arbitrary shapes and generate realistic face completion results.
Contribution
- We propose a deep generative completion model that consists of an encoding-decoding generator and two adversarial discriminators to synthesize the missing contents from random noise
- We tackle the challenging face completion task and show the proposed model is able to generate semantically valid patterns based on learned representations of this object class
- We demonstrate the effectiveness of semantic parsing in generation, which renders the completion results that look both more plausible and consistent with surrounding contexts
Related Works
Image completion; Image generation
Overview
Network architecture. It consists of one generator, two discriminators and a parsing network. The generator takes the masked image as input and outputs the generated image. We replace pixels in the non-mask region of the generated image with original pixels. Two discriminators are learned to distinguish the synthesize contents in the mask and whole generated image as real and fake. The parsing network, which is a pretrained model and remains fixed, is to further ensure the new generated contents more photo-realistic and encourage consistency between new and old pixels. Note that only the generator is needed during the testing.