GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2
Summary
GraphMend is a high-level compiler designed to eliminate FX graph breaks in PyTorch 2 programs, which often fragment models into multiple graphs due to dynamic control flow and unsupported Python constructs. These breaks force costly CPU-to-GPU synchronizations and reduce optimization opportunities. Built on the Jac compilation framework, GraphMend analyzes and transforms source code before execution, introducing two key transformations: Predicated Dynamic Control Flow and Graph-Epilogue Deferred Side Effects. Evaluation across eight Hugging Face models on NVIDIA RTX 3090 and A40 GPUs shows GraphMend removes all fixable breaks, achieving a break count of 0 in 6 models and reducing it from 5 to 2 in another. This leads to up to 75% latency reductions and up to 8% higher end-to-end throughput, demonstrating its effectiveness in improving usability and performance.
Key takeaway
For AI Engineers optimizing PyTorch 2 model performance, you should consider integrating source-level compilation techniques like GraphMend to proactively eliminate FX graph breaks. This approach significantly reduces CPU-GPU synchronization overhead and enables larger, more efficient graph compilation, leading to substantial latency and throughput improvements. Evaluate your models for dynamic control flow and Python I/O breaks, and apply these transformations to maximize GPU utilization.
Key insights
High-level code transformation before execution effectively eliminates PyTorch 2 graph breaks, improving performance.
Principles
- Source-level analysis enables transformations bytecode cannot.
- Deferring side effects prevents graph fragmentation.
- Predicated execution handles dynamic control flow on GPU.
Method
GraphMend uses the Jac framework to parse Python code into a unified IR, identifies Dynamo entry points and graph break types (dynamic control flow, Python I/O), then applies AST transformations before bytecode generation.
In practice
- Rewrite dynamic "if-else" with "torch.where".
- Defer "print" or "logger" calls to function epilogue.
Topics
- PyTorch 2
- FX Graph Breaks
- Code Transformation
- Compiler Optimization
- TorchDynamo
- GPU Performance
Code references
Best for: NLP Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.