Evaluating Large Language Models Abilities for Addressee, Turn-change, and Next Speaker Prediction in Meetings

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study investigated large language models' (LLMs) abilities in predicting turn-taking dynamics within multimodal multi-party conversations, specifically focusing on addressee detection, turn-change prediction, and next speaker prediction. Experiments conducted on the AMI corpus compared text-based LLMs, multimodal LLMs (MM-LLMs), supervised models, and human subjects. The findings revealed that text-based LLMs surprisingly surpassed both supervised models and human performance in next speaker prediction, despite lacking domain-specific training and access to audio or visual information. While MM-LLMs improved upon text-based LLMs for addressee detection and turn-change prediction, they did not reach human-level accuracy, indicating challenges in effectively utilizing raw audio-visual signals. Ablation analyses highlighted the critical role of conversational context, particularly for accurate next speaker prediction, and noted similar prediction patterns and shared difficulties during frequent turn changes between humans and LLMs.

Key takeaway

For NLP Engineers developing meeting transcription or assistant tools, you should prioritize text-based LLMs for next speaker prediction, as they outperform even humans without multimodal data. However, for addressee detection or turn-change prediction, consider multimodal LLMs, but be aware they may still fall short of human accuracy in leveraging raw audio-visual signals. Focus on robust conversational context handling to improve overall turn-taking predictions.

Key insights

Text-based LLMs excel at next speaker prediction in meetings, even without multimodal input.

Principles

Method

An evaluation framework was constructed for addressee detection, turn-change, and next speaker prediction, comparing supervised models, text-based LLMs, MM-LLMs, and humans on the AMI corpus.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.