TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization
Summary
A new dataset, TR-EduVSum, and a consensus framework, AutoMUP (Automatic Meaning Unit Pyramid), have been developed for Turkish educational video summarization. TR-EduVSum comprises 82 Turkish "Data Structures and Algorithms" course videos, each with 36 to 53 independent human summaries, totaling 3,281 summaries. AutoMUP automatically generates gold-standard summaries by extracting meaning units from multiple human summaries, clustering them using multilingual Sentence-Transformer embeddings, and weighting them by inter-participant agreement. The highest-consensus AutoMUP summary (AutoMUP-1) serves as the gold standard. Experimental results demonstrate that AutoMUP-1 summaries achieve high semantic overlap with summaries generated by robust LLMs like Flash 2.5 and GPT-5.1, as measured by BERTScore-F1, ROUGE-L, BLEURT, SBERT, SimCSE, and USE. Ablation studies confirm the critical roles of consensus weighting and clustering in determining summary quality and representativeness.
Key takeaway
For research scientists developing summarization systems for low-resource or morphologically rich languages like Turkish, this work provides a robust, automated framework for generating gold-standard summaries. You should consider adopting a consensus-based approach like AutoMUP to overcome annotator bias and LLM limitations, ensuring reproducibility and high semantic alignment with human judgment, especially when creating new domain-specific datasets.
Key insights
AutoMUP generates high-quality, reproducible gold summaries for Turkish educational videos by leveraging human consensus.
Principles
- Consensus weighting is key for content selection.
- Clustering enhances representativeness and consistency.
- Multiple human summaries reveal common knowledge.
Method
AutoMUP extracts and embeds semantic units from human summaries, clusters them hierarchically by cosine distance, and ranks clusters by support ratio to construct consensus-weighted summaries.
In practice
- Use paraphrase-multilingual-MiniLM-L12-v2 for Turkish embeddings.
- Apply hierarchical clustering with automated threshold selection.
- Rank content units by consensus for quality-graded summaries.
Topics
- TR-EduVSum Dataset
- Educational Video Summarization
- AutoMUP Framework
- Consensus-Based Summarization
- Turkish Language Processing
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.