Simulating Hate Speech Cascades with Multi-LLM Agents: Empirical Grounding, Modeling Fidelity, and Intervention Strategies

2026-06-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Computational Social Science · Depth: Expert, extended

Summary

The study "Simulating Hate Speech Cascades with Multi-LLM Agents" investigates hateful content propagation on online platforms, focusing on Bluesky data from January 1 to April 12, 2026. Researchers analyzed three implicit hate speech cascades (anti-trans, Islamophobia, anti-DEI) and one size-matched benign control, finding hateful cascades exhibit 97.4–99.7% hostile stance, higher toxicity-engagement homophily, and a star-like topology (84–93% breadth/size, depth 4–6). A multi-LLM-agent simulator, using GPT-4o-mini, successfully reproduced these characteristics and uniquely differentiated hateful from benign content. Agent heterogeneity was identified as the primary fidelity factor. Intervention testing revealed warning labels can enlarge hateful cascades by 1.7–54.9% (implied-truth effect), while amplifier targeting on dense networks reduced hateful spread by 7.5–12.9% with 5.7% benign collateral.

Key takeaway

For AI Ethicists or platform moderation teams designing intervention strategies, this research highlights that multi-LLM agent simulations can reveal nuanced propagation dynamics. You should be wary of warning labels, as they may backfire and enlarge hateful cascades, consistent with the implied-truth effect. Instead, consider amplifier targeting on dense follower networks, which showed a 7.5–12.9% hateful reduction with 5.7% benign collateral, but always verify network density first.

Key insights

Multi-LLM agents can faithfully simulate implicit hate speech cascades by conditioning on user profiles, community, and content.

Principles

Hate speech cascades exhibit stance monoculture.
Agent heterogeneity drives simulation fidelity.
Warning labels can backfire via implied-truth.

Method

The multi-LLM-agent system assigns per-user profiles and follower-graph neighborhoods, using GPT-4o-mini to predict reshare probability from profile, context, and post text, treating it as a Bernoulli parameter.

In practice

Use probability-prediction prompts for LLM agents.
Target amplifiers on dense networks for moderation.
Conduct per-cascade density checks for interventions.

Topics

Hate Speech Simulation
Multi-LLM Agents
Bluesky Platform
Cascade Modeling
Content Moderation
Social Network Analysis

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Research Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.