Mod-Guide: An LLM-based Content Moderation Feedback System to Address Insensitive Speech toward Indigenous Ethnic and Religious Minority Communities

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Expert, extended

Summary

Mod-Guide, an LLM-based content moderation feedback system, addresses culturally insensitive speech targeting Bangladesh's Hindu and Chakma communities, the country's largest religious and Indigenous ethnic minorities. Developed between December 2024 and January 2025, this system integrates a co-created corpus of 132 culturally insensitive speech instances and community explanations into moderation pipelines using Retrieval Augmented Generation (RAG). A mixed-method evaluation, involving two experts and 15 participants, compared RAG-enhanced GPT-4 responses against off-the-shelf GPT-4. Results indicate that RAG significantly improves contextual accuracy and the perceived usefulness of moderation feedback, with usefulness varying across ethnic lines but not religious ones. This work contributes a unique dataset and the Mod-Guide artifact, advancing AI ethics and human-computer interaction by incorporating epistemically marginalized perspectives.

Key takeaway

For AI Ethicists and NLP Engineers designing content moderation systems for diverse online communities, recognize that standard LLMs often reinforce majority perspectives, leading to hermeneutical injustice. You should integrate Retrieval Augmented Generation (RAG) with community-sourced, culturally grounded corpora to ensure moderation reflects minority viewpoints. This approach, exemplified by Mod-Guide, significantly enhances contextual accuracy and perceived usefulness, fostering restorative justice and preventing epistemic erasure in your systems.

Key insights

LLMs can achieve culturally sensitive content moderation by integrating retrieval-augmented generation (RAG) with community-sourced, epistemically marginalized perspectives.

Principles

Insensitive speech implicitly disregards minority cultural values.
Content moderation needs restorative justice principles.
Minority datasets are "prototype-based categories."

Method

Mod-Guide's method involves co-creating a culturally insensitive speech corpus via Asynchronous Remote Communities (ARC), integrating it into a GPT-4 RAG pipeline with persona prompting, and evaluating feedback through textual analysis, expert review, and user studies.

In practice

Ground LLM moderation in community-sourced data via RAG.
Use persona-based prompting for diverse moderation roles.
Prioritize minority hermeneutics in data curation.

Topics

LLM Content Moderation
Retrieval-Augmented Generation
AI Ethics
Culturally Insensitive Speech
Minority Perspectives
Restorative Justice

Best for: Research Scientist, AI Scientist, AI Ethicist, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.