-
Notifications
You must be signed in to change notification settings - Fork 0
Generative Adversarial Network
In the field of image generation, we already covered Variational Autoencoder. The problem of VAE is that the output image can be very blurry due to the pixel-by-pixel loss function. The network will easy converge to a solution that will blur the details to minimize the loss. To deal with this problem, Generative Adversarial Network uses a Discriminator that can determine whether an image is "real" or not and a Generator similar to that of VAE that can produce image. The idea is that two network competes with each other to produce real images.
A much more detailed explanation can also be found here
Following sessions discuss whether this method works and how well it works in certain cases. We will skip the easy Mnist examples.
The code can be found here
The most interesting thing in training the Cifar dataset is that the network no longer works if the image is normalized. The only difference is here
Here is the result when the network is not normalized, input range 0-255:
d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit, labels=tf.ones_like(D_logit)))
d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake_logit, labels=tf.zeros_like(D_fake_logit)))
dloss = d_loss_real + d_loss_fake
If the generator is good enough that it can fool the discriminator, the dloss should be around 0.69, which is ideal in the case above. Therefore after epochs of training the result is very amazing. The images don't have any specific content, but they are pretty real on the first look.
However, if the input is normalized from [0, 255] to [-1, 1]. The produced images are very poor. Here is the result:
With Cifar being successful, I also tried CelebA dataset on the basically same network. Here's the result:
It's interesting why this happens and currently I have absolutely no clue. I also tried other DCGAN code and it seems like some of them are working correctly, especially this one using Torch: here. When I was running his code, the generated images are pretty decent on the first 23 epochs and converged to random noise (similar to the results I had) on the 24th and 25th epochs.
Later I found that it's pretty difficult to train GAN successfully. The discriminator can easily become too powerful and destroys the generator, causing generator loss to spike. Some of the methods proposed are in fact weakening the discriminator by not using batch normalization, fewer layers and smaller layer dimensions. I personally feel that weakening the discriminator this way will damage the quality of the images and amount of information learned by the discriminator.
After using the improved GAN-training methods, the GAN model can consistently produce reasonable results after training.