OpenRCA 2.0: From Outcome Labels to Causal Process Supervision
Summary
OpenRCA 2.0 is introduced as the first cross-system root cause analysis (RCA) benchmark featuring step-wise causal annotations for large language model (LLM) agents. Existing RCA datasets typically simplify the task by only labeling the root cause, neglecting the propagation path to the symptom. To address this, OpenRCA 2.0 leverages PAVE, a step-wise labeling protocol that uses known fault injection interventions to reconstruct causal propagation paths through forward verification. This benchmark comprises 500 instances. Evaluations across 11 frontier LLMs on OpenRCA 2.0 revealed that agents recover the exact root-cause set in only 20.7% of cases on average. Further analysis identified "ungrounded diagnosis," where agents correctly identify at least one root-cause service in 76.0% of cases but ground it in a verified causal propagation path in only 61.5%. This highlights that outcome-only evaluation masks critical failure modes, emphasizing the need for step-wise causal ground truth for reliable LLM-based RCA.
Key takeaway
For MLOps Engineers evaluating LLM agents for root cause analysis, you must move beyond outcome-only metrics. Your current evaluations likely mask "ungrounded diagnosis" where LLMs identify a root cause but fail to verify its causal path. Adopt benchmarks like OpenRCA 2.0, which provide step-wise causal ground truth, to ensure your LLM agents deliver trustworthy and verifiable RCA, improving system reliability.
Key insights
Outcome-only root cause analysis evaluation for LLMs hides critical "ungrounded diagnosis" failures, necessitating step-wise causal path supervision.
Principles
- RCA datasets often simplify LLM tasks.
- Forward verification reconstructs causal paths.
- Outcome-only evaluation hides reasoning failures.
Method
PAVE is a step-wise labeling protocol that uses known fault injection to reconstruct causal propagation paths via forward verification, reasoning from cause to effect.
In practice
- Evaluate LLMs with OpenRCA 2.0.
- Implement step-wise causal ground truth.
- Ground root causes in propagation paths.
Topics
- Root Cause Analysis
- LLM Agents
- Causal Inference
- OpenRCA 2.0
- PAVE Protocol
- Evaluation Metrics
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.