LLMs for automatic annotation of Mandarin narrative transcripts

2026-05-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computational Linguistics · Depth: Expert, quick

Summary

A study evaluated the effectiveness of Large Language Models (LLMs) for automatically annotating Mandarin narrative transcripts, specifically focusing on narrative macrostructure using the Multilingual Assessment Instrument for Narratives (MAIN). Researchers compared four LLMs against human annotators on narratives from children, young adults, and older adults. The top-performing LLM achieved a Kappa agreement score of k=.794 with human raters, closely approaching the human-human reliability of k=.872, and reduced annotation time by 65%. However, a lightweight, locally deployable model performed significantly worse. The study found that annotation difficulty varied by macrostructure element type, with categories requiring subtle semantic differentiation presenting challenges. Model reliability also decreased for young adult narratives due to increased lexical variation and semantic ambiguity. The findings indicate LLMs can support discourse-level annotation in non-English spoken corpora, but human oversight remains crucial for semantically complex tasks.

Key takeaway

For NLP Engineers working with non-English spoken corpora, consider integrating LLMs for initial discourse-level annotation to achieve substantial time savings. While LLMs can significantly accelerate the process, your team should plan for human oversight, especially for narrative elements requiring nuanced semantic differentiation or those from lexically diverse speakers, to maintain high annotation quality.

Key insights

LLMs can automate complex discourse-level annotation in non-English speech, significantly reducing time.

Principles

LLM reliability varies by semantic complexity.
Lexical variation impacts LLM annotation accuracy.

Method

Evaluated LLMs for Mandarin narrative macrostructure annotation using MAIN, comparing agreement with human raters across different age groups.

In practice

Use LLMs for initial annotation passes.
Prioritize human review for semantically complex elements.

Topics

Large Language Models
Mandarin Narrative Transcripts
Linguistic Annotation
Narrative Macrostructure
Multilingual Assessment Instrument for Narratives

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.