Walls, Shields, and Illusions: Defenses and Their Limits

· Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, long

Summary

This analysis explores three common defenses against adversarial attacks on a Convolutional Neural Network (CNN) trained on MNIST: FGSM adversarial training, PGD adversarial training, and defensive distillation. A baseline model achieved 99.00% clean accuracy, degrading to 96.60% under FGSM (ε=0.3) and 95.62% under PGD (ε=0.3). FGSM-trained models showed 99.08% clean accuracy, 97.97% against FGSM, and 97.61% against PGD. PGD-trained models also achieved 99.08% clean accuracy, 97.52% against FGSM, and 97.27% against PGD. Defensive distillation proved the weakest, with PGD accuracy varying between 93.68% and 95.43%. The study concludes that while these defenses improve robustness, they do not eliminate the model's "silence" – its overconfidence in misclassified adversarial examples, as demonstrated by a PGD-trained model misclassifying an image with 0.6187 confidence. This highlights the continuous "work → break → adapt" cycle in adversarial machine learning.

Key takeaway

For Machine Learning Engineers building robust models, understand that current adversarial defenses like PGD training raise the bar but do not eliminate model overconfidence in misclassifications. You should integrate awareness mechanisms beyond simple confidence thresholds, as a model's "hesitation" (e.g., 0.6187 confidence) may still be too high to flag errors. Focus on developing models that are "undeluded" rather than just "unbreakable."

Key insights

Adversarial defenses raise robustness but models remain overconfident and "silent" about their limits, perpetuating an arms race.

Principles

Method

The study implemented FGSM and PGD adversarial training, and defensive distillation on a CNN for MNIST. Models were evaluated against FGSM and PGD attacks across ε=0.00 to 0.40.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.