Mod-Guide: An LLM-based Content Moderation Feedback System to Address Insensitive Speech toward Indigenous Ethnic and Religious Minority Communities
Summary
Mod-Guide, an LLM-based content moderation feedback system, addresses culturally insensitive speech targeting Bangladesh's Hindu and Chakma communities, the country's largest religious and Indigenous ethnic minorities. Developed between December 2024 and January 2025, this system integrates a co-created corpus of 132 culturally insensitive speech instances and community explanations into moderation pipelines using Retrieval Augmented Generation (RAG). A mixed-method evaluation, involving two experts and 15 participants, compared RAG-enhanced GPT-4 responses against off-the-shelf GPT-4. Results indicate that RAG significantly improves contextual accuracy and the perceived usefulness of moderation feedback, with usefulness varying across ethnic lines but not religious ones. This work contributes a unique dataset and the Mod-Guide artifact, advancing AI ethics and human-computer interaction by incorporating epistemically marginalized perspectives.
Key takeaway
For AI Ethicists and NLP Engineers designing content moderation systems for diverse online communities, recognize that standard LLMs often reinforce majority perspectives, leading to hermeneutical injustice. You should integrate Retrieval Augmented Generation (RAG) with community-sourced, culturally grounded corpora to ensure moderation reflects minority viewpoints. This approach, exemplified by Mod-Guide, significantly enhances contextual accuracy and perceived usefulness, fostering restorative justice and preventing epistemic erasure in your systems.
Key insights
LLMs can achieve culturally sensitive content moderation by integrating retrieval-augmented generation (RAG) with community-sourced, epistemically marginalized perspectives.
Principles
- Insensitive speech implicitly disregards minority cultural values.
- Content moderation needs restorative justice principles.
- Minority datasets are "prototype-based categories."
Method
Mod-Guide's method involves co-creating a culturally insensitive speech corpus via Asynchronous Remote Communities (ARC), integrating it into a GPT-4 RAG pipeline with persona prompting, and evaluating feedback through textual analysis, expert review, and user studies.
In practice
- Ground LLM moderation in community-sourced data via RAG.
- Use persona-based prompting for diverse moderation roles.
- Prioritize minority hermeneutics in data curation.
Topics
- LLM Content Moderation
- Retrieval-Augmented Generation
- AI Ethics
- Culturally Insensitive Speech
- Minority Perspectives
- Restorative Justice
Best for: Research Scientist, AI Scientist, AI Ethicist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.