FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

FAPO (Fully Autonomous Prompt Optimization) is a new framework designed to optimize multi-step LLM pipelines by addressing interaction failures across retrieval, reasoning, and formatting steps. Utilizing Claude Code within a standardized codebase, FAPO systematically evaluates pipelines, inspects intermediate stages, diagnoses issues, proposes targeted changes, and validates variants against a score function. The framework initially focuses on prompt edits, escalating to structural modifications only when prompt optimization proves insufficient and a structural bottleneck is identified. FAPO significantly outperforms the GEPA baseline, winning in 15 of 18 model-benchmark comparisons with a mean gain of +14.1 pp. Specifically, on HoVer and IFBench, where structural changes were implemented, FAPO achieved a mean gain of +33.8 pp. It also enhances performance on security tasks, boosting test accuracy on CTIBench-RCM by +4.0 pp on GPT-5, +7.1 pp on Foundation-Sec-8B-Instruct, and +2.0 pp on Foundation-Sec-8B-Reasoning. These results position FAPO as a leading pipeline optimization technique for both general-purpose and security-focused tasks.

Key takeaway

For Machine Learning Engineers building multi-step LLM pipelines, FAPO offers a robust solution to overcome performance bottlenecks. If your current prompt-only optimization efforts are insufficient, consider implementing FAPO to autonomously diagnose and refine both prompts and pipeline structures. This approach can significantly improve accuracy, as demonstrated by gains of +14.1 pp generally and up to +33.8 pp for complex tasks, including security applications like CVE-to-CWE mapping.

Key insights

FAPO autonomously optimizes multi-step LLM pipelines by iteratively diagnosing and refining prompts or structural components.

Principles

Method

FAPO evaluates, inspects intermediate steps, diagnoses failures, proposes scoped changes (prompt edits first, then structural), and validates variants repeatedly against a score function using Claude Code.

In practice

Topics

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.