LLMs for automatic annotation of Mandarin narrative transcripts
Summary
A study evaluated the effectiveness of Large Language Models (LLMs) for automatically annotating Mandarin narrative transcripts, specifically focusing on narrative macrostructure using the Multilingual Assessment Instrument for Narratives (MAIN). Researchers compared four LLMs against human annotators on narratives from children, young adults, and older adults. The top-performing LLM achieved a Kappa agreement score of k=.794 with human raters, closely approaching the human-human reliability of k=.872, and reduced annotation time by 65%. However, a lightweight, locally deployable model performed significantly worse. The study found that annotation difficulty varied by macrostructure element type, with categories requiring subtle semantic differentiation presenting challenges. Model reliability also decreased for young adult narratives due to increased lexical variation and semantic ambiguity. The findings indicate LLMs can support discourse-level annotation in non-English spoken corpora, but human oversight remains crucial for semantically complex tasks.
Key takeaway
For NLP Engineers working with non-English spoken corpora, consider integrating LLMs for initial discourse-level annotation to achieve substantial time savings. While LLMs can significantly accelerate the process, your team should plan for human oversight, especially for narrative elements requiring nuanced semantic differentiation or those from lexically diverse speakers, to maintain high annotation quality.
Key insights
LLMs can automate complex discourse-level annotation in non-English speech, significantly reducing time.
Principles
- LLM reliability varies by semantic complexity.
- Lexical variation impacts LLM annotation accuracy.
Method
Evaluated LLMs for Mandarin narrative macrostructure annotation using MAIN, comparing agreement with human raters across different age groups.
In practice
- Use LLMs for initial annotation passes.
- Prioritize human review for semantically complex elements.
Topics
- Large Language Models
- Mandarin Narrative Transcripts
- Linguistic Annotation
- Narrative Macrostructure
- Multilingual Assessment Instrument for Narratives
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.