StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks

Authors

Tero Karras, Samuli Laine, Timo Aila

NVIDIA

Portals

Summary

Our generator architecture makes it possible to control the image synthesis via scale-specific modifications to the styles. Our generator starts from a learned constant input and adjusts the “style” of the image at each convolution layer based on the latent code, therefore directly controlling the strength of image features at different scales. Combined with noise injected directly into the network, this architectural change leads to automatic, unsupervised separation of high-level attributes (e.g., pose, identity) from stochastic variation (e.g., freckles, hair) in the generated images, and enables intuitive scale-specific mixing and interpolation operations.

Abstract

We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.

Contribution

We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. We have collected a new dataset of human faces, FlickrFaces-HQ (FFHQ), consisting of 70,000 high-quality images at 1024 x 1024 resolution

Overview

While a traditional generator feeds the latent code though the input layer only, we first map the input to an intermediate latent space W, which then controls the generator through adaptive instance normalization (AdaIN) at each convolution layer. Gaussian noise is added after each convolution, before evaluating the nonlinearity. Here “A” stands for a learned affine transform, and “B” applies learned per-channel scaling factors to the noise input. The mapping network f consists of 8 layers and the synthesis network g consists of 18 layers — two for each resolution. The output of the last layer is converted to RGB using a separate 1 × 1 convolution, similar to Karras et al. . Our generator has a total of 26.2M trainable parameters, compared to 23.1M in the traditional generator.

PDF Preview

1812.04948

StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks

StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks

Authors

Portals

Summary

Abstract

Contribution

Overview

PDF Preview

Like this:

Leave a Reply Cancel reply

StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks

StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks

Authors

Portals

Summary

Abstract

Contribution

Overview

PDF Preview

Like this:

You may also Like:

NeRF-Art: Text-Driven Neural Radiance Fields Stylization

TileGen: Tileable, Controllable Material Generation and Capture

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

Leave a Reply Cancel reply