FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines
Summary
FAPO (Fully Autonomous Prompt Optimization) is a new framework designed to optimize multi-step LLM pipelines by addressing interaction failures across retrieval, reasoning, and formatting steps. Utilizing Claude Code within a standardized codebase, FAPO systematically evaluates pipelines, inspects intermediate stages, diagnoses issues, proposes targeted changes, and validates variants against a score function. The framework initially focuses on prompt edits, escalating to structural modifications only when prompt optimization proves insufficient and a structural bottleneck is identified. FAPO significantly outperforms the GEPA baseline, winning in 15 of 18 model-benchmark comparisons with a mean gain of +14.1 pp. Specifically, on HoVer and IFBench, where structural changes were implemented, FAPO achieved a mean gain of +33.8 pp. It also enhances performance on security tasks, boosting test accuracy on CTIBench-RCM by +4.0 pp on GPT-5, +7.1 pp on Foundation-Sec-8B-Instruct, and +2.0 pp on Foundation-Sec-8B-Reasoning. These results position FAPO as a leading pipeline optimization technique for both general-purpose and security-focused tasks.
Key takeaway
For Machine Learning Engineers building multi-step LLM pipelines, FAPO offers a robust solution to overcome performance bottlenecks. If your current prompt-only optimization efforts are insufficient, consider implementing FAPO to autonomously diagnose and refine both prompts and pipeline structures. This approach can significantly improve accuracy, as demonstrated by gains of +14.1 pp generally and up to +33.8 pp for complex tasks, including security applications like CVE-to-CWE mapping.
Key insights
FAPO autonomously optimizes multi-step LLM pipelines by iteratively diagnosing and refining prompts or structural components.
Principles
- Pipeline failures stem from inter-step interactions.
- Prioritize prompt edits before structural changes.
- Iterative diagnosis and validation drive optimization.
Method
FAPO evaluates, inspects intermediate steps, diagnoses failures, proposes scoped changes (prompt edits first, then structural), and validates variants repeatedly against a score function using Claude Code.
In practice
- Apply FAPO to improve LLM pipeline accuracy.
- Consider FAPO for security-focused CVE-to-CWE tasks.
- Use Claude Code for autonomous pipeline optimization.
Topics
- LLM Pipelines
- Prompt Optimization
- Claude Code
- Automated Optimization
- Security Tasks
- Pipeline Architecture
Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.