OpenRCA 2.0: From Outcome Labels to Causal Process Supervision

2026-06-25 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, medium

Summary

OpenRCA 2.0 is introduced as the first cross-system root cause analysis (RCA) benchmark featuring step-wise causal annotations for large language model (LLM) agents. Existing RCA datasets typically simplify the task by only labeling the root cause, neglecting the propagation path to the symptom. To address this, OpenRCA 2.0 leverages PAVE, a step-wise labeling protocol that uses known fault injection interventions to reconstruct causal propagation paths through forward verification. This benchmark comprises 500 instances. Evaluations across 11 frontier LLMs on OpenRCA 2.0 revealed that agents recover the exact root-cause set in only 20.7% of cases on average. Further analysis identified "ungrounded diagnosis," where agents correctly identify at least one root-cause service in 76.0% of cases but ground it in a verified causal propagation path in only 61.5%. This highlights that outcome-only evaluation masks critical failure modes, emphasizing the need for step-wise causal ground truth for reliable LLM-based RCA.

Key takeaway

For MLOps Engineers evaluating LLM agents for root cause analysis, you must move beyond outcome-only metrics. Your current evaluations likely mask "ungrounded diagnosis" where LLMs identify a root cause but fail to verify its causal path. Adopt benchmarks like OpenRCA 2.0, which provide step-wise causal ground truth, to ensure your LLM agents deliver trustworthy and verifiable RCA, improving system reliability.

Key insights

Outcome-only root cause analysis evaluation for LLMs hides critical "ungrounded diagnosis" failures, necessitating step-wise causal path supervision.

Principles

RCA datasets often simplify LLM tasks.
Forward verification reconstructs causal paths.
Outcome-only evaluation hides reasoning failures.

Method

PAVE is a step-wise labeling protocol that uses known fault injection to reconstruct causal propagation paths via forward verification, reasoning from cause to effect.

In practice

Evaluate LLMs with OpenRCA 2.0.
Implement step-wise causal ground truth.
Ground root causes in propagation paths.

Topics

Root Cause Analysis
LLM Agents
Causal Inference
OpenRCA 2.0
PAVE Protocol
Evaluation Metrics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.