Latent Anchor-Driven Test Generation for Deep Neural Networks
Summary
Latte is a black-box testing framework designed for Deep Neural Networks (DNNs) that addresses limitations in existing latent-space test generation methods, specifically regarding exploration controllability, failure diversity, and semantic drift. It operates by encoding input seeds using a pre-trained VQ-VAE, performing a seed-centered, one-step latent mutation guided by anchors sampled from alternative classes, and then decoding the mutated points back to the input space. Evaluated across five datasets, including MNIST, CIFAR10, and ImageNet, and ten DNN models like LeNet-5 and ResNet50, Latte consistently improves fault exposure and behavioral diversity. It also maintains low seed-relative semantic drift, outperforming baselines such as SINVAD and Mimicry in failure count, diversity, and testing efficiency under matched budgets.
Key takeaway
For machine learning engineers and AI scientists developing or deploying DNNs in safety-critical applications, you should consider Latte for black-box testing. This framework significantly improves fault exposure and behavioral diversity while maintaining low semantic drift, offering a more efficient alternative to existing methods. Implementing Latte can help you uncover a broader range of model weaknesses and decision instabilities, ensuring more robust and reliable DNN deployments.
Key insights
Latte employs anchor-guided, seed-centric latent space exploration to generate diverse, fault-revealing DNN test cases with low semantic drift.
Principles
- Latent space mutation preserves input plausibility.
- Anchor-guided exploration targets decision instability.
- Controlled exploration balances fault exposure and semantic proximity.
Method
Encode input seeds via VQ-VAE. Sample anchors from alternative classes. Mutate latent representations along seed-anchor directions. Quantize and decode to input space for testing.
In practice
- Utilize VQ-VAE for stable latent representations.
- Adjust exploration degree (e.g., E=3) for optimal balance.
- Apply both single-model and multi-model testing oracles.
Topics
- Deep Neural Networks
- Black-box Testing
- Latent Space Exploration
- VQ-VAE
- Test Generation
- Fault Exposure
Code references
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.