GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2

2025-09-12 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

GraphMend is a high-level compiler designed to eliminate FX graph breaks in PyTorch 2 programs, which often fragment models into multiple graphs due to dynamic control flow and unsupported Python constructs. These breaks force costly CPU-to-GPU synchronizations and reduce optimization opportunities. Built on the Jac compilation framework, GraphMend analyzes and transforms source code before execution, introducing two key transformations: Predicated Dynamic Control Flow and Graph-Epilogue Deferred Side Effects. Evaluation across eight Hugging Face models on NVIDIA RTX 3090 and A40 GPUs shows GraphMend removes all fixable breaks, achieving a break count of 0 in 6 models and reducing it from 5 to 2 in another. This leads to up to 75% latency reductions and up to 8% higher end-to-end throughput, demonstrating its effectiveness in improving usability and performance.

Key takeaway

For AI Engineers optimizing PyTorch 2 model performance, you should consider integrating source-level compilation techniques like GraphMend to proactively eliminate FX graph breaks. This approach significantly reduces CPU-GPU synchronization overhead and enables larger, more efficient graph compilation, leading to substantial latency and throughput improvements. Evaluate your models for dynamic control flow and Python I/O breaks, and apply these transformations to maximize GPU utilization.

Key insights

High-level code transformation before execution effectively eliminates PyTorch 2 graph breaks, improving performance.

Principles

Source-level analysis enables transformations bytecode cannot.
Deferring side effects prevents graph fragmentation.
Predicated execution handles dynamic control flow on GPU.

Method

GraphMend uses the Jac framework to parse Python code into a unified IR, identifies Dynamo entry points and graph break types (dynamic control flow, Python I/O), then applies AST transformations before bytecode generation.

In practice

Rewrite dynamic "if-else" with "torch.where".
Defer "print" or "logger" calls to function epilogue.

Topics

PyTorch 2
FX Graph Breaks
Code Transformation
Compiler Optimization
TorchDynamo
GPU Performance

Code references

Jaseci-Labs/jaseci

Best for: NLP Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.