Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Digital Media & Streaming · Depth: Expert, extended

Summary

Bilibili Inc. researchers introduce CASTER, a novel task redefining User-Generated Content (UGC) quality assessment from aesthetic fidelity to community resonance. They propose MEDEA, a Multimodal Engagement-Driven Evaluation Architecture, which employs a Social Chain-of-Thought (Social-CoT) mechanism to simulate diverse viewer personas and their collective cognitive and emotional reactions. MEDEA is trained using supervised fine-tuning and process-supervised reinforcement learning with a Social Alignment Reward. To support this, the team releases CASTER-Bench, a human-annotated benchmark of 1,485 long-form UGC videos (average 442 seconds) across 30 categories. Experiments show MEDEA significantly outperforms state-of-the-art baselines, achieving a 0.650 F1 score on the High-Quality class, while providing interpretable reasoning paths. Traditional LMMs exhibited a "Generosity Bias," over-rationalizing average content.

Key takeaway

For Machine Learning Engineers developing UGC quality assessment systems, traditional VQA or standard LMMs are insufficient. You should integrate human-centric social reasoning, like MEDEA's Social-CoT, to accurately predict community resonance. This approach, leveraging multimodal inputs and social alignment, helps overcome the "Generosity Bias" of general LMMs, ensuring your models identify truly high-quality content for improved recommendation and moderation.

Key insights

UGC quality assessment requires human-centric social reasoning, simulating community reactions beyond aesthetic or technical metrics.

Principles

UGC quality is defined by community resonance, not technical perfection.
Social Chain-of-Thought (Social-CoT) enables empathetic perspective-taking.
Social Alignment Reward grounds simulated reasoning in human cognition.

Method

MEDEA uses a three-stage pipeline: construct Social-CoT corpus, supervised fine-tuning for multimodal perspective-taking, then process-supervised reinforcement learning with Social Alignment Reward.

In practice

Use multimodal inputs (video, title, ASR) for holistic assessment.
Simulate diverse viewer personas to predict community engagement.
Train models with social alignment to avoid generic reasoning.

Topics

User-Generated Content
Video Quality Assessment
Social Chain-of-Thought (Social-CoT)
Large Multimodal Models
Community Resonance
CASTER-Bench

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.