AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
Summary
AOI (Autonomous Operations Intelligence) is a trainable multi-agent framework designed to automate Site Reliability Engineering (SRE) tasks, addressing challenges like restricted access to proprietary data, unsafe action execution, and the inability of closed systems to learn from failures. The framework integrates a trainable diagnostic system using Group Relative Policy Optimization (GRPO) to distill expert knowledge into local open-source models without exposing sensitive data. It features a read-write separated execution architecture, decomposing operations into observation, reasoning, and action phases for safe learning. Additionally, a Failure Trajectory Closed-Loop Evolver converts unsuccessful diagnostic trajectories into corrective supervision signals for continuous data augmentation. Evaluated on the AIOpsLab benchmark, AOI's runtime alone achieved 66.3% best@5 success on 86 tasks, outperforming the prior state-of-the-art (41.9%) by 24.4 percentage points. With Observer GRPO training, a 14B model surpassed Claude Sonnet 4.5 on held-out tasks, and the Evolver improved end-to-end avg@5 by 4.8 percentage points while reducing run-to-run variance by 35%.
Key takeaway
For AI Scientists and Research Scientists developing autonomous SRE agents, this work demonstrates that architecturally enforcing safety through read-write separation and leveraging failed diagnostic trajectories as training signals significantly enhances performance and robustness. You should consider integrating GRPO-based learning from failures and a multi-agent architecture to improve diagnostic precision and reduce run-to-run variance, especially for complex, multi-step reasoning tasks like Root Cause Analysis.
Key insights
Failed diagnostic trajectories are valuable training signals for improving autonomous SRE agents.
Principles
- Architectural safety enhances capability.
- Read-write separation prevents cascading failures.
- GRPO enables learning from diverse valid paths.
Method
AOI uses a multi-agent system (Observer, Probe, Executor) with read-write separation, GRPO for policy optimization, and a Failure Trajectory Evolver to convert failed diagnostic sequences into corrective guidance for continuous learning.
In practice
- Implement read-write separation for SRE automation.
- Use GRPO to distill expert knowledge into local LLMs.
- Convert failed diagnostic attempts into training data.
Topics
- LLM Agents
- AIOps
- Multi-Agent Systems
- Group Relative Policy Optimization
- Failure Trajectory Learning
Best for: AI Scientist, Research Scientist, MLOps Engineer, AI Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.