Improving Certified Robustness via Adversarial Distillation
Summary
AD-CERT is a novel certified training objective designed to enhance the certified robustness of neural networks against adversarial perturbations. It integrates adversarial distillation with an Interval Bound Propagation (IBP) upper bound. This approach addresses the challenge where traditional certified training sacrifices standard accuracy, while adversarial training, though empirically robust, is difficult to formally certify. AD-CERT leverages adversarial information distilled over the logit space from an empirically robust teacher model, serving as an effective lower bound surrogate for certified training. This method achieves state-of-the-art certified performance across several robustness benchmarks. Furthermore, logit-level adversarial distillation within AD-CERT improves certified accuracy by up to 5.40 percentage points compared to robust feature-space distillation objectives in a unified setup.
Key takeaway
For AI Scientists focused on developing certifiably robust neural networks, AD-CERT offers a significant advancement. You should consider integrating adversarial distillation over the logit space from an empirically robust teacher model with Interval Bound Propagation (IBP) in your certified training objectives. This approach can yield state-of-the-art certified performance and improve certified accuracy by up to 5.40 percentage points, addressing the critical trade-off between robustness and standard accuracy in your models.
Key insights
AD-CERT improves certified robustness by distilling adversarial logit information from a robust teacher with IBP.
Principles
- Combining adversarial training with IBP improves certified accuracy.
- Logit-space distillation is superior to feature-space for certified accuracy.
- Adversarial distillation can serve as an effective lower bound surrogate.
Method
AD-CERT combines adversarial distillation over logit space from an empirically robust teacher with an Interval Bound Propagation (IBP) upper bound to optimize certified robustness.
In practice
- Implement logit-level distillation for certified training.
- Use IBP upper bounds in certified training objectives.
- Explore robust teacher models for adversarial distillation.
Topics
- Certified Robustness
- Adversarial Distillation
- Interval Bound Propagation
- Neural Network Verification
- Adversarial Training
- Logit-level Distillation
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.