FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines
Summary
FAPO (Fully Autonomous Prompt Optimization) is a framework that utilizes Claude Code to optimize multi-step LLM pipelines by evaluating performance, inspecting intermediate steps, diagnosing failures, and proposing iterative changes. It prioritizes prompt edits but escalates to structural modifications when attribution identifies deeper bottlenecks. Across six benchmarks and three task models, FAPO surpassed the GEPA baseline in 15 of 18 comparisons, achieving a mean gain of +14.1 pp. Notably, on HoVer and IFBench, where structural changes were implemented, FAPO delivered a mean gain of +33.8 pp. For security tasks like CTIBench-RCM, prompt-only FAPO boosted test accuracy by +4.0 pp on GPT-5, +7.1 pp on Foundation-Sec-8B-Instruct, and +2.0 pp on Foundation-Sec-8B-Reasoning.
Key takeaway
For AI Engineers building multi-step LLM pipelines, FAPO provides a critical framework for autonomous optimization that extends beyond traditional prompt tuning. You should consider integrating FAPO's evidence-grounded, prompt-first approach to diagnose and resolve pipeline bottlenecks, including structural issues. This can significantly improve performance on complex tasks like multi-hop QA and security classification, ensuring more reliable and efficient LLM-powered applications.
Key insights
FAPO autonomously optimizes multi-step LLM pipelines by diagnosing failures and iteratively applying prompt or structural changes.
Principles
- Separate shared tester from task logic.
- Ground decisions in recorded evidence.
- Prefer smallest useful change (prompt-first).
Method
FAPO evaluates a pipeline, records step-level evidence, attributes failures, proposes a scoped variant (prompt or structural), reviews it, and iterates if performance improves or escalates if prompt edits are insufficient.
In practice
- Optimize multi-hop QA pipelines.
- Improve security CVE-to-CWE classification.
- Enhance fact-verification and instruction-following.
Topics
- Multi-step LLM Pipelines
- Prompt Optimization
- Autonomous Agents
- Claude Code
- Pipeline Optimization
- Failure Attribution
- Security Classification
Code references
Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.