Karras et al. : Alias-Free Generative Adversarial Networks

2022年7月15日 09:22

https://nvlabs.github.io/stylegan3/
Advances in Neural Information Processing Systems, 34（ 2021）

Yingtao Tian（Google Brain Tokyo）

※本記事のPDFは情報処理学会電子図書館に掲載されており、情報処理学会会員は無料で閲覧できます。（http://id.nii.ac.jp/1001/00190378/）

A line of state-of-the-art GAN for generating Images

　The work we introduce in this article, also called StyleGAN3[1], is a generative adversarial network (GAN) that is capable of producing images with very high quality. As one can see the authors' example in animation: https://nvlabs.github.io/stylegan3/, high resolution images of different genres could be successfully generated by the proposed model, while mitigating the issues in previous models which are already amazing.

　However, it should be noted that StyleGAN3 is not a standalone work; instead it is the latest improvement of a line of state-of-the-art GANs, all by Karras et al. Along this line are progressively grown GAN[2], StyleGAN[3], StyleGAN2[4] and most recently, StyleGAN3. While there is a general trend in the whole line of development, each work itself offers a drastic improvement over previous ones that are themselves distinctive contributions.

　More importantly, all these works are released by the Nvidia Labs which most of the authors belong to (available at https://github.com/tkarras/progressive_growing_of_gans and https://github.com/NVlabs/{stylegan,stylegan2,stylegan3}) in the form of code and pre-trained model. The release facilitates a lot of works which take inspiration from it by applying such techniques to other domains.

Context: Previous Versions

　A progressively grown GAN[2] was proposed at a time where GANs are considered to only confidently generate high-quality images of resolution around 128×128 and have difficulty going beyond that resolution. StyleGAN amazed the community by being able to generate images of much higher resolution, up to 1024×1024, with great fidelity. In doing so, lots of new ideas were proposed: probably the most important one is that the generators and the discriminators are trained in a progressive fashion, where layers are increasingly added during the training.

　In the next work StyleGAN[3], the authors proposed to completely change how generators were organized. Unlike a traditional GAN where generators are stacked blockers taking the input, StyleGAN proposes to map the input to a space of styles, and feed these mapped styles into each block of the generator respectively using adaptive instance normalization (AdaIN), which is why it is called "style-based generator". With one other technique, StyleGAN produces better results, can be trained without progressive growth, and enables style blending and control by the nature of its design.　The authors keep analyzing the effect of StyleGAN, and find some not-so-obvious artifact in the generated images that look like water droplets. The authors hypothesize that behind these artifacts there is a systematic problem caused by how the training process interacts with the instance normalization in the network. To handle this systematic issue, the author in a new work StyleGAN2[4] proposed to revise the architecture of the generator network by introducing a "demodulation" operation in the place of instance normalization, along with many other regularizations that are jointly proposed. As a result, the generated images achieve even higher quality, enabling back projection of image to latent space and allowing attribution of generated images (given an image and a trained model, assess whether the image is the production of such a model), which is important given the possible misuse of high-quality image generation techniques. Furthermore, the authors also proposed StyleGAN2-ADA[2] that, with a new adaptive discriminator augmentation, allows stable training with limited data. The improvement makes it possible to train a reasonable model with a significantly fewer number of images.

The Latest Work

　The authors did not stop at where StyleGAN2 is, even though it is capable of facilitating image generation with high quality, allowing style controlling, and enabling attribution of the generated images. In the next work, StyelGAN3[1], the authors studied the observation of texture sticking in images generated by StyleGAN2, where details are sticking to the pixel locations, and, as a result, images are not translation/rotation invariant. Solving such an issue of aliases becomes a good motivation for further improvements.　

　In this direction, the authors analyze the network structure carefully and point out that such an issue is caused by the discrete signal interpretation due to the nature of convolution layers. Having identified the source of issue, the authors propose to revise the architecture by introducing a continuous signal interpretation that handles the translation and rotation that is equivariant for three fundamental kinds of operations in the generators: convolution, up/down-sampling and nonlinearly. Practically, such an issue with aliases has been fundamentally solved, and the generators show even more impressive results. Furthermore, very extensive analysis has been conducted to demonstrate the contribution of each detail in the revision.

Conclusion

　The line of works introduced in this article nonetheless represents a dramatic improvement of GAN technique in general. They not only push the frontier of image generation by GANs, which is explained in the huge improvements and well-motivated changes in the papers, but also enable many AI participants to try such models and build the applications/web services on their own. As all the resources are openly available, it is worthy to check them out and try to materialize one’s next idea.

Acknowledgements
Special thanks to Tarin Clanuwat, Yujin Tang and Yuma Koizumi forhelpful suggestions and editorial efforts.

参考文献
1） Karras et al. : Alias-Free Generative Adversarial Networks Advances in Neural Information Processing Systems (2021).
2） Karras et al. : Progressive Growing of GANs for Improved Quality, Stability, and Variation, International Conference on Learning Representations (2018).
3） Karras et al. : A Style-Based Generator Architecture for Generative Adversarial Networks, Conference on Computer Vision and Pattern Recognition (2019).
4） Karras et al. : Analyzing and Improving the Image Quality of StyleGAN, Conference on Computer Vision and Pattern Recognition (2020).
5） Karras et al. : Training Generative Adversarial Networks with Limited Data, Advances in Neural Information Processing Systems (2020).

（2022年4月28日受付）
（2022年7月15日note公開）

■Yingtao Tian
2019年米国ニューヨーク州立ストーニーブルック大学コンピュータサイエンス博士課程修了，博士（理学）．現在，Google Brain東京のリサーチソフトウェアエンジニア，表現学習を用いた画像生成，コンピュータの創造性，人工生命，デジタルヒューマニティーズなどの研究に従事．