Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

2026-06-19 · Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

A new framework, Contagion Networks, quantifies evaluator bias propagation in multi-agent LLM systems. Experiments using DeepSeek-chat agents with distinct bias profiles (structured, balanced, evidence-based) revealed that evaluator biases consistently propagate, with mean contagion coefficients γ ∈ [0.143, 0.304]. Homogeneous-model agents typically operate in a "suppression regime," where bias attenuates rapidly across hops (cumulative factor β₃=0.0055). This contrasts sharply with prior cross-model work (MM-EPC), which observed 3–5× stronger contagion (γ ≈ 0.85–1.3) leading to a "cascade regime." Crucially, increasing the evaluator committee size from k=1 to k=3 reduced effective contagion by 72.4%, demonstrating a practical mitigation strategy. The open-source framework is released for community use.

Key takeaway

For AI Architects designing multi-agent LLM systems where agents evaluate each other, you must account for evaluator bias propagation. Measure the Cross-Agent Contagion Matrix Γₚ pre-deployment to predict system stability. Prioritize homogeneous model families for evaluators, as they offer natural bias suppression. Additionally, implement evaluator committees of at least three agents and monitor strategy entropy H(ω) as a real-time health indicator to maintain cognitive diversity.

Key insights

Evaluator biases propagate in multi-agent LLM systems, but homogeneous models suppress it, while diverse evaluator committees mitigate it.

Principles

Bias propagation magnitude depends on evaluator diversity.
Homogeneous model pools provide implicit regularization.
Spectral radius ρ(Γₚ) governs propagation regimes.

Method

The Contagion Networks framework measures bias propagation using a Cross-Agent Contagion Matrix Γₚ and Test-Time Reinforcement Learning (TTRL) for strategy updates.

In practice

Measure Γ before system deployment.
Prefer homogeneous evaluator pools.
Use committees of ≥ 3 evaluators.

Topics

Multi-Agent Systems
LLM Evaluation
Bias Propagation
Contagion Networks
Evaluator Diversity
DeepSeek-chat

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.