This is a PyTorch implementation of GANs, focusing on generating anime faces.
- Build anime-faces dataset
- Implement GANs
- Implement StyleGANs
- Implement Conditional GANs
All anime-faces images are collected and proprecessed by myself. Anime-style images of 45 tags (tags.txt) are collected from danbooru.donmai.us using the crawler tool gallery-dl. After deleting unrelated images without anime-faces, the images are then processed by a anime face detector lbpcascade_animeface in build_animeface_dataset.py. After cropping, meaningless images are deleted manually and the resulting dataset contains about 100,000 anime faces in total. For conditional GANs, anime-faces images of 20 tags (tags_20.txt) (about 50,000 images) are utilized for training. For StyleGANs, after cropping and filtering low-quality images, around 60,000 images (512x512) are utilized for training.
Dataset is here.
Dataset for StyleGAN is here.
Only a NVIDIA RTX 2080 Ti GPU is used for training. For StyleGAN and StyleGAN2, I spent 8 days training models separately and the models might not fully converge.
To train the model (default:dcgan),
python train.py --dataRoot path_to_dataset --cuda
In train.py multiple gans are available by initializing --model:
- GAN: use
--model 0to runmodels/gan.py - DCGAN: use
--model 1to runmodels/dcgan.py - W-DCGAN: use
--model 2to runmodels/wdcgan.py - W-DCGAN_GP: use
--model 3to runmodels/wdcgan_gp.py - W-ResGAN_GP: use
--model 4to runmodels/wresgan_gp.py - CGAN: use
--model 5to runmodels/cdcgan.py - ACGAN: use
--model 6to runmodels/acgan_resnet.py
| Training for 100 epochs (.gif) | Generated 64x64 samples (.jpg) |
|---|---|
![]() |
![]() |
| Training for 100 epochs (.gif) | Generated 64x64 samples (.jpg) |
|---|---|
![]() |
![]() |
| Training for 100 epochs (.gif) | Generated 64x64 samples (.jpg) |
|---|---|
![]() |
![]() |
| Training for 100 epochs (.gif) | Generated 64x64 samples (.jpg) |
|---|---|
![]() |
![]() |
| Training for 100 epochs (.gif) | Generated 64x64 samples (.jpg) |
|---|---|
![]() |
![]() |
Generated samples are based on the following category order, where the images of each category are shown in each row.
From top to bottom: green_hair, orange_hair, purple_hair, silver_hair, blue_eyes, green_eyes, pink_eyes, red_eyes
| Training for 100 epochs (.gif) | Generated 64x64 samples (.jpg) |
|---|---|
![]() |
![]() |
Generated samples are based on the following category order, where the images of each category are shown in each row.
From top to bottom: green_hair, orange_hair, purple_hair, silver_hair, blue_eyes, green_eyes, pink_eyes, red_eyes
| Training for 100 epochs (.gif) | Generated 64x64 samples (.jpg) |
|---|---|
![]() |
![]() |
The video of training progression (12,000 iterations) is here.
Here are the generated 512x512 samples (.jpg).
More generated samples can be found here.
The video of training progression (3,000 iterations) is here.
Here are the generated 512x512 samples (.jpg).
More generated samples can be found here.
Here are the style mixing examples.
- GAN is really hard to train since it is difficult to balance D and G.
- DCGAN generally works better than GAN and it can generate clearer images with details.
- WGAN trains more stably and has the metric to show the convergence during training, also avoids mode collapse problem.
- WGAN-GP using gradient penalty shows more powerful performance and better generated images than WGAN using weight clipping.
- CGAN is also hard to train and easily causes mode collapse problem.
- ACGAN seems more stable and powerful in generating conditional images.
- The blobs in training are part of how StyleGAN 'creates' new features and in fact doing something useful.
- StyleGAN2 changes the AdaIN normalization to eliminate the blobs problem and improve overall quality.
- The most important thing for training GANs is to learn to balance D and G.
- Add noise to D's inputs and labels helps stablize training.
- Adam is always good, but exponetially decaying learning rate seems not so helpful and makes no significant differences.
- Training D several times than G sometimes seems helpful (WGAN) but easily makes D so strong thus upsets the existing balance.
- Giving D higher learning rate than G seems lead to better results.
- D should be a little more powerful to lead G to generate better images.
- The learning rate is one of the most critical hyperparameters.
- One of the more powerful ways to improve performance is data cleaning/augmentation.


















