AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

2026-03-05 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

AOI (Autonomous Operations Intelligence) is a trainable multi-agent framework designed to automate Site Reliability Engineering (SRE) tasks, addressing challenges like restricted access to proprietary data, unsafe action execution, and the inability of closed systems to learn from failures. The framework integrates a trainable diagnostic system using Group Relative Policy Optimization (GRPO) to distill expert knowledge into local open-source models without exposing sensitive data. It features a read-write separated execution architecture, decomposing operations into observation, reasoning, and action phases for safe learning. Additionally, a Failure Trajectory Closed-Loop Evolver converts unsuccessful diagnostic trajectories into corrective supervision signals for continuous data augmentation. Evaluated on the AIOpsLab benchmark, AOI's runtime alone achieved 66.3% best@5 success on 86 tasks, outperforming the prior state-of-the-art (41.9%) by 24.4 percentage points. With Observer GRPO training, a 14B model surpassed Claude Sonnet 4.5 on held-out tasks, and the Evolver improved end-to-end avg@5 by 4.8 percentage points while reducing run-to-run variance by 35%.

Key takeaway

For AI Scientists and Research Scientists developing autonomous SRE agents, this work demonstrates that architecturally enforcing safety through read-write separation and leveraging failed diagnostic trajectories as training signals significantly enhances performance and robustness. You should consider integrating GRPO-based learning from failures and a multi-agent architecture to improve diagnostic precision and reduce run-to-run variance, especially for complex, multi-step reasoning tasks like Root Cause Analysis.

Key insights

Failed diagnostic trajectories are valuable training signals for improving autonomous SRE agents.

Principles

Architectural safety enhances capability.
Read-write separation prevents cascading failures.
GRPO enables learning from diverse valid paths.

Method

AOI uses a multi-agent system (Observer, Probe, Executor) with read-write separation, GRPO for policy optimization, and a Failure Trajectory Evolver to convert failed diagnostic sequences into corrective guidance for continuous learning.

In practice

Implement read-write separation for SRE automation.
Use GRPO to distill expert knowledge into local LLMs.
Convert failed diagnostic attempts into training data.

Topics

LLM Agents
AIOps
Multi-Agent Systems
Group Relative Policy Optimization
Failure Trajectory Learning

Best for: AI Scientist, Research Scientist, MLOps Engineer, AI Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.