Authors
Sebastian Lutz, Konstantinos Amplianitis, Aljosa Smolic
Trinity College Dublin
Portals
Summary
To tackle the problem of image matting, we use a generative adversarial network. The generator of this network is a convolutional encoder-decoder network that is trained both with help of the ground-truth alphas as well as the adversarial loss from the discriminator
Abstract
We present the first generative adversarial network (GAN) for natural image matting. Our novel generator network is trained to predict visually appealing alphas with the addition of the adversarial loss from the discriminator that is trained to classify well-composited images. Further, we improve existing encoder-decoder architectures to better deal with the spatial localization issues inherited in convolutional neural networks (CNN) by using dilated convolutions to capture global context information without downscaling feature maps and losing spatial information. We present state-of-the-art results on the alphamatting online benchmark for the gradient error and give comparable results in others. Our method is particularly well suited for fine structures like hair, which is of great importance in practical matting applications, e.g. in film/TV production.
Contribution
- We propose a generative adversarial network (GAN) for natural image matting. We improve on the network architecture of Xu et al. to better deal with the spatial localization issues inherent in CNNs by using dilated convolutions to capture global context information without downscaling feature maps and losing spatial information. Furthermore, we improve on the decoder structure of the network and use it as the generator in our generative adversarial model. The discriminator is trained on images that have been composited with the ground-truth alpha and the predicted alpha and therefore learns to recognize images that have been composited well, which helps the generator learn alpha predictions that lead to visually appealing compositions.
Related Works
Local sample-based natural image matting; Deep learning in natural image matting
Overview
Xu et al. have recently shown that it is possible to train an encoder-decoder network with their matting dataset to produce state-of-the-art results. We build on their approach and trained a deep generative adversarial network on the same dataset. Our AlphaGAN architecture consists of one generator G and one discriminator D. G takes an image composited from the foreground, alpha and a random background appended with the trimap as 4th-channel as input and attempts to predict the correct alpha. D tries to distinguish between real 4-channel inputs and fake inputs where the first 3 channels are composited from the foreground, background and the predicted alpha.