TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, extended

Summary

A new dataset, TR-EduVSum, and a consensus framework, AutoMUP (Automatic Meaning Unit Pyramid), have been developed for Turkish educational video summarization. TR-EduVSum comprises 82 Turkish "Data Structures and Algorithms" course videos, each with 36 to 53 independent human summaries, totaling 3,281 summaries. AutoMUP automatically generates gold-standard summaries by extracting meaning units from multiple human summaries, clustering them using multilingual Sentence-Transformer embeddings, and weighting them by inter-participant agreement. The highest-consensus AutoMUP summary (AutoMUP-1) serves as the gold standard. Experimental results demonstrate that AutoMUP-1 summaries achieve high semantic overlap with summaries generated by robust LLMs like Flash 2.5 and GPT-5.1, as measured by BERTScore-F1, ROUGE-L, BLEURT, SBERT, SimCSE, and USE. Ablation studies confirm the critical roles of consensus weighting and clustering in determining summary quality and representativeness.

Key takeaway

For research scientists developing summarization systems for low-resource or morphologically rich languages like Turkish, this work provides a robust, automated framework for generating gold-standard summaries. You should consider adopting a consensus-based approach like AutoMUP to overcome annotator bias and LLM limitations, ensuring reproducibility and high semantic alignment with human judgment, especially when creating new domain-specific datasets.

Key insights

AutoMUP generates high-quality, reproducible gold summaries for Turkish educational videos by leveraging human consensus.

Principles

Method

AutoMUP extracts and embeds semantic units from human summaries, clusters them hierarchically by cosine distance, and ranks clusters by support ratio to construct consensus-weighted summaries.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.