DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

2026-03-03 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

The paper introduces the Dynamic Interaction Graph (DIG), a novel framework designed to enhance the reliability and explainability of multi-agent systems composed of general-purpose Large Language Model (LLM) agents. Unlike traditional systems that rely on predefined roles or workflows, DIG enables emergent collaboration by modeling agent activations and interactions as a time-evolving causal network. This approach allows for real-time identification, explanation, and correction of collaboration-induced error patterns directly from agents' interaction paths. The DIG framework formalizes interaction structure, provides a topological characterization of failure modes, and supports structure-driven system healing. Experiments on Count Frequency and 20 Newsgroups Classification tasks demonstrate that MAS+DIG significantly reduces Root Mean Squared Error (RMSE) and improves valid output rates, especially for complex tasks with up to 20 agents, outperforming MAS-only and MAS+LLM Judge baselines by actively detecting and resolving failures with negligible overhead.

Key takeaway

For research scientists developing or deploying autonomous multi-agent LLM systems, you should integrate the Dynamic Interaction Graph (DIG) framework to gain unprecedented visibility into emergent collaboration dynamics. This will allow you to diagnose and automatically correct interaction failures in real-time, significantly improving system reliability and accuracy, particularly as agent count and task complexity scale. Your systems will become more robust and transparent, moving beyond predefined workflows to truly autonomous coordination.

Key insights

The Dynamic Interaction Graph (DIG) makes emergent multi-LLM agent collaboration observable, explainable, and correctable in real-time.

Principles

Collaboration itself is the primary object of analysis.
System-level reasoning reduces to inference over DIG topology.
Failures are defined by observable interaction patterns.

Method

DIG constructs a time-evolving bipartite causal graph from execution traces, representing agent activations and events. Local graph rewrite operators (Respond, Wait, Reroute, Discard, Submit) define how agents transform interaction edges, enabling real-time failure detection and healing interventions.

In practice

Use DIG to trace causal dependencies in multi-agent systems.
Apply DIG's failure taxonomy for real-time error detection.
Implement DIG-driven healing to correct collaboration failures.

Topics

Multi-agent Systems
Large Language Models
Dynamic Interaction Graph
Emergent Collaboration
Failure Detection and Healing

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.