Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism
Summary
Sergei Vorobyov and Eugene Ilyushin introduce methods to scale formal neural network verification, which is often limited by GPU memory. Their work adapts Tensor Parallelism (TP) and Fully Sharded Data Parallelism (FSDP) to the `auto_LiRPA` / α,β-CROWN verification framework. TP shards both weight and A-matrices across GPUs, achieving approximately 2x peak-memory reduction with P=2, and its soundness was confirmed on VNN-COMP 2022 MNIST-FC benchmarks, though bound tightness decreased. FSDP, which shards only weight matrices using a per-layer `AllGather`, produced bounds bitwise identical to single-GPU baselines. FSDP reduced baseline memory by 80-90% and peak memory by 34-39% on wide MLPs, integrating with complete verification (β-CROWN + Branch-and-Bound) and convolutional layers. It enabled a complete *unsat* result for CIFAR-100 ResNet-large (VNN-COMP 2024). The authors note that per-neuron alpha tensors, not weight matrices, are the primary memory bottleneck in α-CROWN+BaB mode.
Key takeaway
For AI Architects designing robust verification systems, consider integrating Fully Sharded Data Parallelism (FSDP) into your `auto_LiRPA` / α,β-CROWN workflows. FSDP significantly reduces GPU memory usage by 80-90%. It maintains bitwise identical verification bounds, enabling complete verification of larger models like CIFAR-100 ResNet-large. This approach allows you to verify more complex neural networks without sacrificing precision, addressing a critical scaling bottleneck.
Key insights
Adapting parallelism from training can mitigate GPU memory limits in neural network verification.
Principles
- FSDP yields bitwise identical verification bounds.
- TP reduces memory but can degrade bound tightness.
- Alpha tensors are the primary memory bottleneck in α-CROWN+BaB.
Method
Tensor Parallelism (TP) shards weight and A-matrices. Fully Sharded Data Parallelism (FSDP) shards only weight matrices with per-layer `AllGather`. Both adapt to `auto_LiRPA` / α,β-CROWN.
In practice
- Use FSDP for memory-efficient, precise verification.
- Integrate FSDP with complete verification and `BoundConv`.
- Explore alpha tensor sharding for further memory gains.
Topics
- Neural Network Verification
- Tensor Parallelism
- Fully Sharded Data Parallelism
- GPU Memory Optimization
- auto_LiRPA
- α-CROWN
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.