Goal-Autopilot: A Verifiable Anti-Fabrication Firewall for Unattended Long-Horizon Agents
Summary
Goal-Autopilot is an execution model designed as a verifiable anti-fabrication firewall for unattended long-horizon LLM agents, addressing their tendency to confidently report unverified success. It structurally prevents fabricated success by externalizing all working state into a durable, gated finite-state machine, which a scheduler advances one stateless tick at a time. A hard floor mechanism forbids any terminal "done" claim if its falsifiable gate did not actually execute and pass. The system proves a "No-False-Success theorem," ensuring termination implies goal achievement under specific conditions. In evaluations across a 3,150-cell corpus, Autopilot achieved a 0.95% fabrication rate [95% CI 0.38--1.62], significantly lower than Reflexion's 8.10% [6.48--9.81] and StateFlow's 25.05% [22.48--27.62]. On SWE-bench Lite, fabrication dropped from 33.7% (StateFlow) to 0.67%, a paired difference of -33.07 pp. This mechanism prioritizes honesty over coverage, accepting an honest stall over a confident wrong output. Per-step context cost remains constant in the horizon.
Key takeaway
For Machine Learning Engineers deploying long-horizon LLM agents in unattended environments, you must address the critical risk of agents fabricating success. Goal-Autopilot demonstrates a structural solution, reducing fabrication from 33.7% to 0.67% on SWE-bench Lite. Implement verifiable execution models that externalize state and enforce gated completion to ensure your autonomous systems deliver honest, rather than merely confident, results. This approach prioritizes reliability, preventing costly downstream errors from unverified agent claims.
Key insights
Goal-Autopilot structurally prevents LLM agent fabrication by enforcing verifiable state transitions and terminal conditions.
Principles
- Treat honesty as a first-class metric for unattended autonomy.
- Externalize agent working state into a durable, gated FSM.
- A hard floor can forbid unverified terminal claims.
Method
Autopilot uses a scheduler to advance a stateless agent one tick at a time, rehydrating only the state machine, with a hard floor preventing unverified "done" claims based on gate execution.
In practice
- Implement gated finite-state machines for agent verification.
- Prioritize verifiable honesty over raw capability in critical agent tasks.
Topics
- Goal-Autopilot
- LLM Agents
- Agent Fabrication
- Verifiable Autonomy
- Finite-State Machines
- SWE-bench Lite
Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.