EARS: Explanatory Abstention for Reliable Sub-Agent Modeling in Large-scale Multi-Agent Systems

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Advanced, medium

Summary

The EARS (Explanatory Abstention for Reliable Sub-Agent Modeling) framework addresses reliability issues in large-scale multi-agent systems (MAS) where domain-specialized sub-agents often over-answer ambiguous or unsupported requests, producing hallucinations. EARS reframes sub-agent abstention as an inter-agent communication protocol, enabling sub-agents to expose actionable failure states to a coordinator. This framework curates human-agent interaction data using an ensemble of calibrated LLM-as-a-Judge models to generate structured abstention labels and rationales based on a taxonomy of sub-agent failure modes. These data then fine-tune sub-agents to detect specific failure conditions and provide rationales for coordinator-level clarification, rerouting, or fallback. Evaluated in a production e-commerce assistant for business intelligence, EARS improved the overall response pass rate from 68.5% to 78.9%.

Key takeaway

For AI Engineers building multi-agent systems in enterprise settings, EARS offers a robust approach to enhance system reliability and reduce hallucinations. By implementing explanatory abstention, your sub-agents can provide actionable failure rationales, allowing the coordinator to clarify, reroute, or fallback effectively. This method significantly improves response pass rates, as demonstrated by the 10.4% increase in a production e-commerce assistant, making your MAS more dependable for complex business intelligence tasks.

Key insights

EARS enhances multi-agent system reliability by enabling sub-agents to explain their inability to answer, facilitating coordinator action.

Principles

Method

EARS curates human-agent interaction data using LLM-as-a-Judge models to generate structured abstention labels and rationales. These data fine-tune sub-agents to detect failures and return rationales for coordinator action.

In practice

Topics

Code references

Best for: AI Architect, Research Scientist, CTO, AI Scientist, MLOps Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.