Heterogeneous LLM Debate Under Adversarial Peers: Honest Gains, Replacement Costs, and Resilience

2026-06-19 · Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

The study "Heterogeneous LLM Debate Under Adversarial Peers" investigates how diverse LLM panels perform when faced with adversarial influence. It measures changes in honest agents' revision behavior, specifically how often they change answers and whether revisions are corrective or harmful. Across four model families and three reasoning benchmarks (MATH-hard, SciBench, GSM8K), an honest heterogeneous peer significantly reduces harmful revision. For Llama-3.1-70B defenders on MATH-hard, the harmful-revision rate dropped from 89% in homogeneous panels to 35% with an honest peer, but an adversarial peer returned it to 90%. Crucially, when an adversary is already present, an honest heterogeneous peer acts as a defense, cutting the flip rate on initially-correct items from 31% to 6% for Llama-3.1-70B on MATH-hard. This demonstrates heterogeneity is both an attack surface and a defense.

Key takeaway

For AI Security Engineers or ML teams deploying multi-agent LLM systems, you should carefully assess the integrity of any added heterogeneous peers. While diversity can significantly reduce harmful revisions and act as a defense in compromised panels, a single adversarial peer can negate these benefits. Prioritize robust peer vetting and implement monitoring for end-of-debate flip rates, especially for weaker models, to detect subtle adversarial influence.

Key insights

Heterogeneity in LLM debate is a dual-edged sword, offering both corrective gains and adversarial vulnerability.

Principles

Honest heterogeneity lowers harmful revision.
Adversarial peers reverse honest gains.
Heterogeneity defends against existing adversaries.

Method

The study uses a multi-agent debate protocol with three agents and five rounds, varying panel composition. It measures revision behavior via detection-generation decomposition, focusing on corrective vs. harmful changes and end-of-debate flip rates.

In practice

Evaluate peer integrity before deployment.
Use heterogeneous peers in contaminated panels.
Monitor flip rates for weak defenders.

Topics

LLM Debate
Adversarial AI
Multi-agent Systems
Model Heterogeneity
Revision Behavior
AI Security

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.