A Multi-Dimensional, Per-Pass Empirical Study of the LLVM Optimization Pipeline

2026-07-01 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

An empirical study systematically analyzed the LLVM -O3 optimization pipeline, decomposing it into 113 cumulative per-pass prefixes. Researchers performed 84,750 measurements across execution time, compile time, binary size, hardware counters, and RAPL energy on 30 PolyBench/C kernels. Findings reveal the pipeline is non-monotone, with 6.6–9.7% of transitions degrading performance, and strongly back-loaded, requiring 84.8% of passes for 80% of speedup. A small Pareto-dominant core of passes, including early-cse and loop-vectorize, drives most gains. The final -O3 configuration is Pareto-dominated on (size, speedup) for 29 of 30 kernels. Additionally, IR instruction count is an unreliable runtime predictor, and runtime-targeted passes achieve 30–60% energy savings. The idealized-additive upper bound on phase-interference loss is 46.35%.

Key takeaway

For compiler engineers and ML engineers optimizing LLVM-based pipelines, this study highlights that the default -O3 configuration is often suboptimal, with earlier checkpoints offering better speedup-to-binary-size trade-offs. You should consider implementing "stop at trajectory-best" flags to avoid late-pipeline regressions and augment cost models with dynamic hardware counter data, as IR instruction count is an unreliable runtime proxy. Focus optimization efforts on the identified Pareto-dominant passes.

Key insights

LLVM's -O3 pipeline is non-monotone and back-loaded, with a few passes driving most gains and significant phase interference.

Principles

Compiler pipelines are non-monotone and order-dependent.
IR instruction count is an unreliable runtime proxy.
Runtime-targeted passes often reduce energy for compute-bound kernels.

Method

Decompose an optimization pipeline into cumulative per-pass prefixes, then measure multi-dimensional metrics (runtime, compile time, binary size, hardware counters, energy) for each prefix.

In practice

Identify Pareto-dominant passes for targeted optimization.
Implement "stop at trajectory-best" flags to avoid regressions.
Augment cost models with dynamic or microarchitectural signals.

Topics

LLVM
Compiler Optimization
Optimization Pipeline
Phase Ordering
Performance Analysis
Hardware Counters
Energy Efficiency

Code references

Best for: AI Scientist, Research Scientist, Software Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.