Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A controlled simulation study audited content selection biases across three major Large Language Model (LLM) providers (OpenAI, Anthropic, Google) using real social media datasets from Twitter/X, Bluesky, and Reddit. Researchers conducted 540,000 simulated top-10 selections from pools of 100 posts across 54 experimental conditions, varying LLM provider, platform, and six prompting strategies (general, popular, engaging, informative, controversial, neutral). The study found that polarization is amplified across all configurations, toxicity handling shows a strong inversion between engagement- and information-focused prompts, and sentiment biases are predominantly negative. Provider comparisons revealed distinct trade-offs: GPT-4o Mini showed the most consistent behavior, while Claude and Gemini exhibited high adaptivity in toxicity handling. On Twitter/X, left-leaning authors were systematically over-represented despite forming a plurality of the dataset's right-leaning authors, a pattern that largely persisted across prompts.

Key takeaway

For research scientists and CTOs deploying LLM-based content curation systems, understand that these models inherently amplify polarization and exhibit persistent political leaning biases, even with "neutral" prompts. Prompt engineering can modulate content biases like toxicity and sentiment but is insufficient for mitigating demographic disparities. You must implement more fundamental interventions such as adversarial debiasing, fairness constraints during ranking, or robust training data curation to ensure equitable content exposure.

Key insights

LLM-based content curation consistently amplifies polarization and exhibits demographic biases, often resistant to prompt engineering.

Principles

Method

A controlled simulation study mapped content selection biases by varying LLM provider, platform, and prompt style, analyzing 540,000 top-10 selections from social media posts.

In practice

Topics

Code references

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Ethicist, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.