Authors
Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
University of California, Davis
Portals
Abstract
We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor encoders. MixNMatch requires bounding boxes during training to model background, but requires no other supervision. Through extensive experiments, we demonstrate MixNMatch's ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation, including sketch2color, cartoon2img, and img2gif applications. Our code/models/demo can be found at https://github.com/Yuheng-Li/MixNMatch
Contribution
- We introduce MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture factors from real images with minimal human supervision. This gives MixNMatch fine-grained control in image generation, where each factor can be uniquely controlled. MixNMatch can take as input either real reference images, sampled latent codes, or a mix of both
- Through various qualitative and quantitative evaluations, we demonstrate MixNMatch’s ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation. Furthermore, we show that MixNMatch’s learned disentangled representation leads to state-of-the-art fine-grained object category clustering results of real images
- We demonstrate a number of interesting applications of MixNMatch including sketch2color, cartoon2img, and img2gif
Related Works
Conditional image generation; Disentangled representation learning
Comparisons
Simple-GAN, InfoGAN, LR-GAN, StackGANv2, FineGAN