Evaluation of Automatic Speech Recognition Using Generative Large Language Models
Summary
A new study evaluates the use of generative Large Language Models (LLMs) for Automatic Speech Recognition (ASR) evaluation, aiming to overcome the limitations of the traditional Word Error Rate (WER) metric, which is insensitive to meaning. The research explores three approaches: selecting the better of two hypotheses, calculating semantic distance via generative embeddings, and qualitatively classifying errors. On the HATS dataset, the top-performing LLMs achieved 92-94% agreement with human annotators for hypothesis selection, significantly surpassing WER's 63% agreement and outperforming other semantic metrics. The study also found that embeddings from decoder-based LLMs performed comparably to encoder models, indicating a promising path for more interpretable and semantically aware ASR evaluation.
Key takeaway
For AI Engineers and Research Scientists evaluating ASR systems, integrating generative LLMs into your evaluation pipeline can provide a more human-aligned and semantically sensitive assessment than relying solely on WER. Consider implementing LLM-based hypothesis selection or semantic distance calculations to gain deeper insights into ASR performance and error types, potentially leading to more robust model improvements.
Key insights
Generative LLMs significantly improve ASR evaluation by aligning better with human perception than traditional WER.
Principles
- Semantic metrics correlate better with human perception.
- Decoder-based LLM embeddings perform comparably to encoder models.
Method
Evaluated generative LLMs for ASR via hypothesis selection, semantic distance computation using generative embeddings, and qualitative error classification on the HATS dataset.
In practice
- Use LLMs for ASR hypothesis selection.
- Explore generative embeddings for semantic distance.
Topics
- Automatic Speech Recognition
- Large Language Models
- ASR Evaluation
- Word Error Rate
- Semantic Metrics
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.