OpenRCA 2.0: From Outcome Labels to Causal Process Supervision

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, medium

Summary

OpenRCA 2.0 is introduced as the first cross-system root cause analysis (RCA) benchmark featuring step-wise causal annotations for large language model (LLM) agents. Existing RCA datasets typically simplify the task by only labeling the root cause, neglecting the propagation path to the symptom. To address this, OpenRCA 2.0 leverages PAVE, a step-wise labeling protocol that uses known fault injection interventions to reconstruct causal propagation paths through forward verification. This benchmark comprises 500 instances. Evaluations across 11 frontier LLMs on OpenRCA 2.0 revealed that agents recover the exact root-cause set in only 20.7% of cases on average. Further analysis identified "ungrounded diagnosis," where agents correctly identify at least one root-cause service in 76.0% of cases but ground it in a verified causal propagation path in only 61.5%. This highlights that outcome-only evaluation masks critical failure modes, emphasizing the need for step-wise causal ground truth for reliable LLM-based RCA.

Key takeaway

For MLOps Engineers evaluating LLM agents for root cause analysis, you must move beyond outcome-only metrics. Your current evaluations likely mask "ungrounded diagnosis" where LLMs identify a root cause but fail to verify its causal path. Adopt benchmarks like OpenRCA 2.0, which provide step-wise causal ground truth, to ensure your LLM agents deliver trustworthy and verifiable RCA, improving system reliability.

Key insights

Outcome-only root cause analysis evaluation for LLMs hides critical "ungrounded diagnosis" failures, necessitating step-wise causal path supervision.

Principles

Method

PAVE is a step-wise labeling protocol that uses known fault injection to reconstruct causal propagation paths via forward verification, reasoning from cause to effect.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.