TraceLens: Democratizing AI Performance Analysis

· Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

TraceLens, an open-source tool released on April 27, 2026, by Adeem Jassani et al. at AMD, democratizes AI performance analysis by converting complex framework profiler traces into structured summaries and comparisons. It supports PyTorch and JAX, working with backends like ROCm and CUDA. Key capabilities include Trace2Tree, which builds a hierarchical event tree linking Python operations to GPU kernels, and Hierarchical Performance Breakdowns for identifying bottlenecks. TraceLens also offers Compute & Roofline Modeling to assess kernel efficiency (TFLOPS/s, TB/s), Multi-GPU Communication Analysis to diagnose scaling issues by separating communication time from synchronization skew, and Trace Comparison for quantifying the impact of changes. Additionally, its Event Replay feature generates minimal, self-contained scripts for debugging specific operations in isolation, and an extensible Python SDK allows for custom analyses.

Key takeaway

For NLP or Computer Vision Engineers optimizing AI workloads, TraceLens offers a powerful, open-source solution to quickly diagnose performance bottlenecks. You should integrate TraceLens into your profiling workflow to transform raw traces into actionable insights, enabling you to pinpoint slow kernels, assess multi-GPU scaling, and quantify the impact of code or hardware changes more efficiently. This will accelerate your debugging and optimization cycles.

Key insights

TraceLens transforms complex AI profiler traces into actionable insights for performance optimization across various backends.

Principles

Method

TraceLens consumes framework traces, converts them into a hierarchical event tree (Trace2Tree), then applies various analysis modules for performance breakdowns, compute modeling, multi-GPU communication, and trace comparison.

In practice

Topics

Code references

Best for: NLP Engineer, Computer Vision Engineer, Machine Learning Engineer, MLOps Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.