A Multi-Dimensional, Per-Pass Empirical Study of the LLVM Optimization Pipeline
Summary
An empirical study systematically analyzed the LLVM -O3 optimization pipeline, decomposing it into 113 cumulative per-pass prefixes. Researchers performed 84,750 measurements across execution time, compile time, binary size, hardware counters, and RAPL energy on 30 PolyBench/C kernels. Findings reveal the pipeline is non-monotone, with 6.6–9.7% of transitions degrading performance, and strongly back-loaded, requiring 84.8% of passes for 80% of speedup. A small Pareto-dominant core of passes, including early-cse and loop-vectorize, drives most gains. The final -O3 configuration is Pareto-dominated on (size, speedup) for 29 of 30 kernels. Additionally, IR instruction count is an unreliable runtime predictor, and runtime-targeted passes achieve 30–60% energy savings. The idealized-additive upper bound on phase-interference loss is 46.35%.
Key takeaway
For compiler engineers and ML engineers optimizing LLVM-based pipelines, this study highlights that the default -O3 configuration is often suboptimal, with earlier checkpoints offering better speedup-to-binary-size trade-offs. You should consider implementing "stop at trajectory-best" flags to avoid late-pipeline regressions and augment cost models with dynamic hardware counter data, as IR instruction count is an unreliable runtime proxy. Focus optimization efforts on the identified Pareto-dominant passes.
Key insights
LLVM's -O3 pipeline is non-monotone and back-loaded, with a few passes driving most gains and significant phase interference.
Principles
- Compiler pipelines are non-monotone and order-dependent.
- IR instruction count is an unreliable runtime proxy.
- Runtime-targeted passes often reduce energy for compute-bound kernels.
Method
Decompose an optimization pipeline into cumulative per-pass prefixes, then measure multi-dimensional metrics (runtime, compile time, binary size, hardware counters, energy) for each prefix.
In practice
- Identify Pareto-dominant passes for targeted optimization.
- Implement "stop at trajectory-best" flags to avoid regressions.
- Augment cost models with dynamic or microarchitectural signals.
Topics
- LLVM
- Compiler Optimization
- Optimization Pipeline
- Phase Ordering
- Performance Analysis
- Hardware Counters
- Energy Efficiency
Code references
Best for: AI Scientist, Research Scientist, Software Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.