Adversarial Attacks on Neural Networks

Implementation of adversarial example generation using the Fast Gradient Sign Method (FGSM) against pre-trained neural networks. Demonstrates how imperceptible perturbations can fool state-of-the-art image classifiers.

What are Adversarial Examples?

Adversarial examples are inputs intentionally designed to cause a machine learning model to make mistakes. By adding carefully crafted noise to an image, we can make a neural network misclassify it with high confidence — while the changes are invisible to the human eye.

  Original Image        +   FGSM Noise   =   Adversarial Image
  "Labrador" (99.8%)         (epsilon)        "Missile" (99.9%)

How it Works

The FGSM attack computes the gradient of the loss with respect to the input image, then creates a perturbation in the direction that maximizes the loss:

perturbation = epsilon * sign(gradient_x(J(theta, x, y)))
adversarial_image = original_image + perturbation

Implementation

Target model: MobileNetV2 (pre-trained on ImageNet)
Attack method: FGSM (Fast Gradient Sign Method)
Framework: TensorFlow 2.x / Keras

adversarial.py

Loads pre-trained MobileNetV2
Computes loss gradient w.r.t. input
Generates adversarial perturbation with configurable epsilon
Visualizes original vs. adversarial classification

Usage

python adversarial.py

Why This Matters

Understanding adversarial attacks is critical for:

AI Security — building robust models that resist manipulation
Autonomous systems — ensuring self-driving cars cannot be fooled by stickers
Content moderation — detecting adversarial bypass attempts
WAF/IDS — defending against AI-powered evasion techniques

References

Goodfellow et al. — "Explaining and Harnessing Adversarial Examples" (2014)
Kurakin et al. — "Adversarial Examples in the Physical World" (2016)

Year

2021

Part of a series of talks on AI security presented at Chubut Hack cybersecurity conference.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
adversarial.py		adversarial.py
adversarial_v2.py		adversarial_v2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Attacks on Neural Networks

What are Adversarial Examples?

How it Works

Implementation

adversarial.py

Usage

Why This Matters

References

Year

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adversarial Attacks on Neural Networks

What are Adversarial Examples?

How it Works

Implementation

adversarial.py

Usage

Why This Matters

References

Year

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages