SURF-GAN

Abstract

Over the years, 2D GANs have achieved great successes in photorealistic portrait generation. However, they lack 3D understanding in the generation process, thus they suffer from multi-view inconsistency problem. To alleviate the issue, many 3D-aware GANs have been proposed and shown notable results, but 3D GANs struggle with editing semantic attributes. The controllability and interpretability of 3D GANs have not been much explored. In this work, we propose two solutions to overcome these weaknesses of 2D GANs and 3D-aware GANs. We first introduce a novel 3D-aware GAN, SURF-GAN, which is capable of discovering semantic attributes during training and controlling them in an unsupervised manner. After that, we inject the prior of SURF-GAN into StyleGAN to obtain a high-fidelity 3D-controllable generator. Unlike existing latent-based methods allowing implicit pose control, the proposed 3D-controllable StyleGAN enables explicit pose control over portrait generation. This distillation allows direct compatibility between 3D control and many StyleGAN-based techniques (e.g., inversion and stylization), and also brings an advantage in terms of computational resources.

TL;DR We present a novel 3D-aware GAN, i.e., SURF-GAN, which is able to disentangle and control semantic attributes and then make StyleGAN 3D controllable by injecting the prior of SURF-GAN.

SURF-GAN

We propose a novel 3D-aware GAN, i.e., SURF-GAN, which can discover semantic attributes by learning layer-wise subspace in INR NeRF-based generator in an unsupervised manner.

Architecture

3D-aware generation

Semantic attributes discovered by SURF-GAN

3D-controllable StyleGAN

After that, we inject 3D prior from a low-resolution 3D-Aware GAN (SURF-GAN) into a high-resolution 2D GAN (StyleGAN).

Control over pose

+ Stylization

+ Editing

Video

With a canonical mapping, our model can process portrait images in arbitrary poses.

Also, it is comatible with numerous StyleGAN-based techniques, e.g., Toonifying.

Editing pose of challenging real images with HyperStyle.

In addition, we can edit semantic attributes using discovered attribute directions.

smile (using InterFaceGAN)

Illumination (using SURF-GAN samples)

Limitation

Our 3D controllable StyleGAN is not based on 3D representations such as mesh or NeRF, so as you can see when it comes to video generation, it shows the problem of “texture sticking” pointed out in StyleGAN3 (especially in hair and beard). That is one of the most noticable artifacts in GAN generated videos. We expect this to be mitigated with StyleGAN3.

BibTeX

@inproceedings{kwak2022injecting,  
  title={Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis},  
  author={Kwak, Jeong-gi and Li, Yuanming and Yoon, Dongsik and Kim, Donghyeon and Han, David and Ko, Hanseok},  
  booktitle={European Conference on Computer Vision},  
  pages={236--253},  
  year={2022},  
  organization={Springer}  
}

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

ECCV 2022

SURF-GAN, which is a NeRF-based 3D-aware GAN, can discover semantic attributes in an unsupervised manner and enables users to control them as well as camera parameters.
(Trained on 64x64 CelebA and rendered at 256x256)

We inject the prior of SURF-GAN into StyleGAN for explicit pose control and editable directions.
(Rendered at 1024x1024)

Abstract

SURF-GAN

Architecture

3D-aware generation

Semantic attributes discovered by SURF-GAN

3D-controllable StyleGAN

Control over pose

+ Stylization

+ Editing

Video

With a canonical mapping, our model can process portrait images in arbitrary poses.

Also, it is comatible with numerous StyleGAN-based techniques, e.g., Toonifying.

Editing pose of challenging real images with HyperStyle.

In addition, we can edit semantic attributes using discovered attribute directions.

Limitation

BibTeX

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

ECCV 2022

SURF-GAN, which is a NeRF-based 3D-aware GAN, can discover semantic attributes in an unsupervised manner and enables users to control them as well as camera parameters. (Trained on 64x64 CelebA and rendered at 256x256)

We inject the prior of SURF-GAN into StyleGAN for explicit pose control and editable directions. (Rendered at 1024x1024)

Abstract

SURF-GAN

Architecture

3D-aware generation

Semantic attributes discovered by SURF-GAN

3D-controllable StyleGAN

Control over pose

+ Stylization

+ Editing

Video

With a canonical mapping, our model can process portrait images in arbitrary poses.

Also, it is comatible with numerous StyleGAN-based techniques, e.g., Toonifying.

Editing pose of challenging real images with HyperStyle.

In addition, we can edit semantic attributes using discovered attribute directions.

Limitation

BibTeX

SURF-GAN, which is a NeRF-based 3D-aware GAN, can discover semantic attributes in an unsupervised manner and enables users to control them as well as camera parameters.
(Trained on 64x64 CelebA and rendered at 256x256)

We inject the prior of SURF-GAN into StyleGAN for explicit pose control and editable directions.
(Rendered at 1024x1024)