MIRAGE: Auditing Anti-Muslim Bias in Frontier LLMs Across Reasoning, Agentic, and Time-Coupled Conditions

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

The MIRAGE (Muslim-Identity Reasoning and Agentic Generation Evaluation) benchmark, comprising 1,200 prompts, audits anti-Muslim bias in frontier large language models under deployment-realistic conditions. It spans direct completion, chain-of-thought reasoning, and simulated agentic decision-making, covering content moderation, lending triage, refugee claim summarization, and hiring screens. Across six frontier models, MIRAGE reveals that chain-of-thought reasoning amplifies Muslim-violence associations by 12-34% relative to direct completion. Agentic decisions exhibit a 9-22 percentage-point asymmetry between Muslim and non-Muslim cases on identical evidence. Furthermore, bias is sharply time-coupled to retrieved news context, increasing 18-27% under recent-conflict retrieval. Existing prompt-based mitigations transfer poorly, suppressing direct-completion bias but leaving agentic asymmetry largely intact.

Key takeaway

For AI Scientists and Ethicists deploying frontier LLMs, traditional single-turn bias evaluations are insufficient. The MIRAGE benchmark reveals chain-of-thought reasoning and agentic decision-making significantly amplify anti-Muslim bias, particularly with recent news context. You must prioritize auditing LLMs in deployment-realistic conditions, developing mitigations for complex, multi-turn interactions to ensure equitable outcomes.

Key insights

LLM anti-Muslim bias amplifies in complex, real-world deployment conditions, challenging current evaluation and mitigation methods.

Principles

Method

MIRAGE uses 1,200 prompts across direct completion, chain-of-thought, and agentic decision-making to audit anti-Muslim bias in LLMs.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Ethicist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.