Skip to content

Generative Adversarial Network

Fishest edited this page May 9, 2017 · 8 revisions

Introduction

In the field of image generation, we already covered Variational Autoencoder. The problem of VAE is that the output image can be very blurry due to the pixel-by-pixel loss function. The network will easy converge to a solution that will blur the details to minimize the loss. To deal with this problem, Generative Adversarial Network uses a Discriminator that can determine whether an image is "real" or not and a Generator similar to that of VAE that can produce image. The idea is that two network competes with each other to produce real images.

A much more detailed explanation can also be found here

Following sessions discuss whether this method works and how well it works in certain cases. We will skip the easy Mnist examples.

The code can be found here

Cifar dataset

The most interesting thing in training the Cifar dataset is that the network no longer works if the image is normalized. The only difference is here

Here is the result when the network is not normalized, input range 0-255:

During training, the Discriminator loss is stable around 0.5-0.8. Because the loss function is
d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit, labels=tf.ones_like(D_logit)))
d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake_logit, labels=tf.zeros_like(D_fake_logit)))
dloss = d_loss_real + d_loss_fake

If the generator is good enough that it can fool the discriminator, the dloss should be around 0.69, which is ideal in the case above. Therefore after epochs of training the result is very amazing. The images don't have any specific content, but they are pretty real on the first look.

However, if the input is normalized from [0, 255] to [-1, 1]. The produced images are very poor. Here is the result:

The discriminator loss, in this case, decreased very quickly and converged to 0. It means that the discriminator is so good (or the generator is so bad) that discriminator and recognize what images are real perfectly. After reaching this, the network basically failed and will never produce any meaningful images. You can see that the generator gave up and produced all black images.

CelebA dataset

With Cifar being successful, I also tried CelebA dataset on the basically same network. Here's the result:

The network failed to converge regardless of whether I normalized the images or not.

It's interesting why this happens and currently I have absolutely no clue. I also tried other DCGAN code and it seems like some of them are working correctly, especially this one using Torch: here. When I was running his code, the generated images are pretty decent on the first 23 epochs and converged to random noise (similar to the results I had) on the 24th and 25th epochs.

Later I found that it's pretty difficult to train GAN successfully. The discriminator can easily become too powerful and destroys the generator, causing generator loss to spike. Some of the methods proposed are in fact weakening the discriminator by not using batch normalization, fewer layers and smaller layer dimensions. I personally feel that weakening the discriminator this way will damage the quality of the images and amount of information learned by the discriminator.

After using the improved GAN-training methods, the GAN model can consistently produce reasonable results after training.

Clone this wiki locally