PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media
Summary
Researchers introduced PluRule, a multimodal, multilingual benchmark designed to evaluate AI models' ability to moderate pluralistic online communities, specifically on Reddit. This benchmark formalizes moderation as a multiple-choice task, requiring models to identify which of 2,885 community-defined rules, if any, are violated by a given comment within its full context. PluRule comprises 13,371 moderation instances across 1,989 Reddit communities and 9 languages, including 72,675 comments and 3,643 images. Evaluation of state-of-the-art vision-language models, including GPT-5.2 (high reasoning), revealed significant limitations; GPT-5.2 achieved only 57.7% accuracy, barely surpassing a 50% trivial baseline. Models performed better on universal rules like civility (69%) and self-promotion (63%) but struggled with context-dependent rules such as low-effort (43%), relevance (44%), and evidence-based (47%) violations.
Key takeaway
For research scientists developing AI moderation tools, this study highlights a critical gap: current vision-language models cannot effectively handle the contextual nuances of pluralistic community rules. You should prioritize developing models capable of understanding implicit community norms and context-dependent rule interpretations, possibly through fine-tuning on community-specific examples or retrieval-augmented methods, rather than relying on universal rule enforcement.
Key insights
AI models struggle with context-dependent content moderation in pluralistic online communities, performing only slightly better than a trivial baseline.
Principles
- Community-governed platforms require context-aware moderation.
- Universal rules are easier for AI to detect than nuanced ones.
Method
PluRule formalizes moderation as a multiple-choice task, providing models with comments, community rules, and full conversational context to identify specific rule violations across diverse subreddits and languages.
In practice
- Focus AI moderation on universal rule types first.
- Consider retrieval-augmented approaches for local norms.
Topics
- PluRule Benchmark
- Pluralistic Content Moderation
- Vision-Language Models
- Reddit Communities
- Rule Violation Detection
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.