Implementation of adversarial example generation using the Fast Gradient Sign Method (FGSM) against pre-trained neural networks. Demonstrates how imperceptible perturbations can fool state-of-the-art image classifiers.
Adversarial examples are inputs intentionally designed to cause a machine learning model to make mistakes. By adding carefully crafted noise to an image, we can make a neural network misclassify it with high confidence — while the changes are invisible to the human eye.
Original Image + FGSM Noise = Adversarial Image
"Labrador" (99.8%) (epsilon) "Missile" (99.9%)
The FGSM attack computes the gradient of the loss with respect to the input image, then creates a perturbation in the direction that maximizes the loss:
perturbation = epsilon * sign(gradient_x(J(theta, x, y)))
adversarial_image = original_image + perturbation
- Target model: MobileNetV2 (pre-trained on ImageNet)
- Attack method: FGSM (Fast Gradient Sign Method)
- Framework: TensorFlow 2.x / Keras
- Loads pre-trained MobileNetV2
- Computes loss gradient w.r.t. input
- Generates adversarial perturbation with configurable epsilon
- Visualizes original vs. adversarial classification
python adversarial.pyUnderstanding adversarial attacks is critical for:
- AI Security — building robust models that resist manipulation
- Autonomous systems — ensuring self-driving cars cannot be fooled by stickers
- Content moderation — detecting adversarial bypass attempts
- WAF/IDS — defending against AI-powered evasion techniques
- Goodfellow et al. — "Explaining and Harnessing Adversarial Examples" (2014)
- Kurakin et al. — "Adversarial Examples in the Physical World" (2016)
2021
Part of a series of talks on AI security presented at Chubut Hack cybersecurity conference.