OpenRCA 2.0: From Outcome Labels to Causal Process Supervision

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

OpenRCA 2.0 is introduced as the first cross-system root cause analysis (RCA) benchmark featuring step-wise causal annotations for LLM agents, comprising 500 instances. This benchmark addresses a fundamental gap in existing datasets, which typically label only the root cause, simplifying the task. To achieve this, the PAVE protocol was developed, leveraging known fault injection interventions to reconstruct causal propagation paths through forward verification. Evaluation across 11 frontier LLMs on OpenRCA 2.0 revealed that exact root-cause set recovery succeeds in only 20.7% of cases. While agents identify at least one correct root-cause service in 76.0% of cases, they ground it in a verified causal path to the symptom in only 61.5%, a failure mode termed "ungrounded diagnosis" that outcome-only evaluation hides.

Key takeaway

For AI Engineers developing or evaluating LLM agents for root cause analysis, you must move beyond outcome-only metrics. Your evaluation should incorporate step-wise causal ground truth to identify "ungrounded diagnosis" failures, where agents correctly identify a service but fail to verify its causal path. Prioritize models that demonstrate strong causal path grounding, as this is crucial for trustworthy and actionable RCA outputs.

Key insights

Robust LLM-based root cause analysis requires causal process supervision and step-wise path validation, not just outcome labels.

Principles

Method

The PAVE protocol uses fault injection to reconstruct causal propagation paths via forward verification, enabling step-wise causal annotations for RCA benchmarks.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.