Finished 5th on the Private Leaderboard and 7th on the Public Leaderboard out of 110 participants (550 total submissions).
- Competition: GAN Competition (GAN Challenge)
- Leaderboard: View Standings
- Host: Nitin Kumar Jha
- Participant: Sathvik V (Roll No: 22f2001468)
This project was developed for the GAN Challenge competition. The objective was to train a Generative Adversarial Network (GAN) to generate realistic 32Γ32 images.
The Twist: Unlike standard datasets (like faces or animals), the training data consisted of 58,578 images of abstract noise patterns (see samples below). My goal was to make a model that could learn the specific statistical distribution of this "chaos" and generate new, indistinguishable noise samples.
The dataset was provided as a collection of zipped JSONL shards. Each record contained:
- Base64 Encoded Image: Required decoding from text to bytes.
- Metadata: EXIF rotation, inversion flags (
invert: true/false), and alpha masks. - Visuals: The "real" images looked like white noise, static, or random pixel distributions.
Figure 1: Samples from the training set. The model had to learn to replicate these specific noise textures.
I built a custom ImageShardDataset to handle the complex raw data:
- Decoding: Parsed JSONL files and decoded Base64 strings on the fly.
- Standardization: Applied EXIF rotations, handled 16-bit grayscale conversions, and cropped images based on alpha masks.
- Normalization: Resized all inputs to 32x32 and normalized pixel values to
[-1, 1]for stable GAN training.
Standard DCGANs often suffer from mode collapse. To generate high-quality noise distributions, I implemented a Wasserstein GAN with Gradient Penalty (WGAN-GP).
-
Generator:
- Input: Random latent vector (
$z=100$ ). - Layers:
ConvTranspose2dfor upsampling. -
Innovation: Replaced
BatchNorm2dwithInstanceNorm2d, which proved more stable for this specific texture-generation task. - Activation:
ReLU(hidden) andTanh(output).
- Input: Random latent vector (
-
Discriminator (Critic):
- Architecture: Strided
Conv2dlayers for downsampling. - No Sigmoid: The critic outputs a raw "realness" score (Wasserstein distance), not a probability.
- Gradient Penalty: Enforced the Lipschitz constraint to stabilize training and prevent vanishing gradients.
- Architecture: Strided
-
Optimizer: Adam (
lr=0.0001,beta1=0.5,beta2=0.9). - Critic Updates: Trained the critic 5 times for every 1 generator step to ensure the Wasserstein distance estimate was accurate.
-
Loss Function: Wasserstein Loss + Gradient Penalty (
$\lambda=10$ ).
The competition used FrΓ©chet Inception Distance (FID) to evaluate performance.
- Generated 1,000 new images using the trained Generator.
- Passed them through a pre-trained Inception-V3 model (pool3 layer).
- Extracted 2048-dimensional feature vectors for submission.
- Minimizing the FID score meant my generated "noise" was statistically indistinguishable from the real "noise."
- Clone the repository.
- Install dependencies:
torch,torchvision,Pillow,pandas,numpy. - Download the competition shards into an
input/folder. - Run the notebook
final-notebook-gan.ipynb.
# Example: Generate samples after training
# This will create submission.csv with Inception features
python generate_submission.py