Heterogeneous LLM Debate Under Adversarial Peers: Honest Gains, Replacement Costs, and Resilience
Summary
The study "Heterogeneous LLM Debate Under Adversarial Peers" investigates how diverse LLM panels perform when faced with adversarial influence. It measures changes in honest agents' revision behavior, specifically how often they change answers and whether revisions are corrective or harmful. Across four model families and three reasoning benchmarks (MATH-hard, SciBench, GSM8K), an honest heterogeneous peer significantly reduces harmful revision. For Llama-3.1-70B defenders on MATH-hard, the harmful-revision rate dropped from 89% in homogeneous panels to 35% with an honest peer, but an adversarial peer returned it to 90%. Crucially, when an adversary is already present, an honest heterogeneous peer acts as a defense, cutting the flip rate on initially-correct items from 31% to 6% for Llama-3.1-70B on MATH-hard. This demonstrates heterogeneity is both an attack surface and a defense.
Key takeaway
For AI Security Engineers or ML teams deploying multi-agent LLM systems, you should carefully assess the integrity of any added heterogeneous peers. While diversity can significantly reduce harmful revisions and act as a defense in compromised panels, a single adversarial peer can negate these benefits. Prioritize robust peer vetting and implement monitoring for end-of-debate flip rates, especially for weaker models, to detect subtle adversarial influence.
Key insights
Heterogeneity in LLM debate is a dual-edged sword, offering both corrective gains and adversarial vulnerability.
Principles
- Honest heterogeneity lowers harmful revision.
- Adversarial peers reverse honest gains.
- Heterogeneity defends against existing adversaries.
Method
The study uses a multi-agent debate protocol with three agents and five rounds, varying panel composition. It measures revision behavior via detection-generation decomposition, focusing on corrective vs. harmful changes and end-of-debate flip rates.
In practice
- Evaluate peer integrity before deployment.
- Use heterogeneous peers in contaminated panels.
- Monitor flip rates for weak defenders.
Topics
- LLM Debate
- Adversarial AI
- Multi-agent Systems
- Model Heterogeneity
- Revision Behavior
- AI Security
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.