EARS: Explanatory Abstention for Reliable Sub-Agent Modeling in Large-scale Multi-Agent Systems

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

EARS (Explanatory Abstention for Reliable Sub-Agent Modeling) is a production-oriented framework designed to enhance the reliability of large-scale multi-agent systems (MAS) in enterprise settings. These systems, where a coordinator delegates requests to specialized sub-agents, often suffer from sub-agents over-answering ambiguous or unsupported requests, leading to hallucinations. EARS addresses this by reframing sub-agent abstention as an inter-agent communication protocol, enabling sub-agents to expose actionable failure states and rationales to the coordinator. The framework curates human-agent interaction data using an ensemble of calibrated LLM-as-a-Judge models, generating structured abstention labels and rationales under a taxonomy of failure modes. This data fine-tunes sub-agents to detect failure conditions and return rationales for coordinator-level clarification, rerouting, or fallback. Evaluated in a production e-commerce assistant, EARS improved the overall response pass rate from 68.5% to 78.9%.

Key takeaway

For MLOps Engineers managing large-scale multi-agent systems, integrating EARS can significantly boost system reliability. If your sub-agents, especially smaller fine-tuned models, frequently hallucinate or over-answer, implementing explanatory abstention allows them to communicate actionable failure states. This approach improves overall response pass rates, as demonstrated by an increase from 68.5% to 78.9%, by enabling intelligent rerouting or fallback at the coordinator level. Consider adopting this framework to enhance the robustness of your enterprise AI assistants.

Key insights

EARS improves multi-agent system reliability by enabling sub-agents to provide explanatory abstention for coordinator-level action.

Principles

Method

EARS uses an ensemble of calibrated LLM-as-a-Judge models to curate human-agent interaction data, generating structured abstention labels and rationales. This data then fine-tunes sub-agents to detect failure conditions and return explanations.

In practice

Topics

Best for: AI Architect, Research Scientist, CTO, AI Scientist, MLOps Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.