PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

2026-05-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

Researchers introduced PluRule, a multimodal, multilingual benchmark designed to evaluate AI models' ability to moderate pluralistic online communities, specifically on Reddit. This benchmark formalizes moderation as a multiple-choice task, requiring models to identify which of 2,885 community-defined rules, if any, are violated by a given comment within its full context. PluRule comprises 13,371 moderation instances across 1,989 Reddit communities and 9 languages, including 72,675 comments and 3,643 images. Evaluation of state-of-the-art vision-language models, including GPT-5.2 (high reasoning), revealed significant limitations; GPT-5.2 achieved only 57.7% accuracy, barely surpassing a 50% trivial baseline. Models performed better on universal rules like civility (69%) and self-promotion (63%) but struggled with context-dependent rules such as low-effort (43%), relevance (44%), and evidence-based (47%) violations.

Key takeaway

For research scientists developing AI moderation tools, this study highlights a critical gap: current vision-language models cannot effectively handle the contextual nuances of pluralistic community rules. You should prioritize developing models capable of understanding implicit community norms and context-dependent rule interpretations, possibly through fine-tuning on community-specific examples or retrieval-augmented methods, rather than relying on universal rule enforcement.

Key insights

AI models struggle with context-dependent content moderation in pluralistic online communities, performing only slightly better than a trivial baseline.

Principles

Community-governed platforms require context-aware moderation.
Universal rules are easier for AI to detect than nuanced ones.

Method

PluRule formalizes moderation as a multiple-choice task, providing models with comments, community rules, and full conversational context to identify specific rule violations across diverse subreddits and languages.

In practice

Focus AI moderation on universal rule types first.
Consider retrieval-augmented approaches for local norms.

Topics

PluRule Benchmark
Pluralistic Content Moderation
Vision-Language Models
Reddit Communities
Rule Violation Detection

Code references

osome-iu/PluRule

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.