Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation
Summary
Bilibili Inc. researchers introduce CASTER, a novel task redefining User-Generated Content (UGC) quality assessment from aesthetic fidelity to community resonance. They propose MEDEA, a Multimodal Engagement-Driven Evaluation Architecture, which employs a Social Chain-of-Thought (Social-CoT) mechanism to simulate diverse viewer personas and their collective cognitive and emotional reactions. MEDEA is trained using supervised fine-tuning and process-supervised reinforcement learning with a Social Alignment Reward. To support this, the team releases CASTER-Bench, a human-annotated benchmark of 1,485 long-form UGC videos (average 442 seconds) across 30 categories. Experiments show MEDEA significantly outperforms state-of-the-art baselines, achieving a 0.650 F1 score on the High-Quality class, while providing interpretable reasoning paths. Traditional LMMs exhibited a "Generosity Bias," over-rationalizing average content.
Key takeaway
For Machine Learning Engineers developing UGC quality assessment systems, traditional VQA or standard LMMs are insufficient. You should integrate human-centric social reasoning, like MEDEA's Social-CoT, to accurately predict community resonance. This approach, leveraging multimodal inputs and social alignment, helps overcome the "Generosity Bias" of general LMMs, ensuring your models identify truly high-quality content for improved recommendation and moderation.
Key insights
UGC quality assessment requires human-centric social reasoning, simulating community reactions beyond aesthetic or technical metrics.
Principles
- UGC quality is defined by community resonance, not technical perfection.
- Social Chain-of-Thought (Social-CoT) enables empathetic perspective-taking.
- Social Alignment Reward grounds simulated reasoning in human cognition.
Method
MEDEA uses a three-stage pipeline: construct Social-CoT corpus, supervised fine-tuning for multimodal perspective-taking, then process-supervised reinforcement learning with Social Alignment Reward.
In practice
- Use multimodal inputs (video, title, ASR) for holistic assessment.
- Simulate diverse viewer personas to predict community engagement.
- Train models with social alignment to avoid generic reasoning.
Topics
- User-Generated Content
- Video Quality Assessment
- Social Chain-of-Thought (Social-CoT)
- Large Multimodal Models
- Community Resonance
- CASTER-Bench
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.