Non-Autoregressive Minimum Bayes' Risk Decoding for Fast Speech Recognition

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A novel Non-Autoregressive Minimum Bayes' Risk (NAR-MBR) decoding framework is proposed to enhance speech recognition speed and accuracy. Non-autoregressive (NAR) decoding generates output tokens in parallel, offering faster processing than sequential autoregressive (AR) methods, but typically suffers from degraded recognition performance due to its inability to condition on previously generated tokens. NAR-MBR addresses this by maximizing expected utility calculated from samples drawn from the NAR model's output probability, rather than just maximizing the probability itself. This approach efficiently obtains multiple samples with a single forward computation. Experiments across LibriSpeech, Switchboard, AMI, and web presentation corpus demonstrated that NAR-MBR decoding outperformed previous NAR decoding methods and ran faster than AR decoding.

Key takeaway

For NLP Engineers and AI Scientists optimizing speech recognition systems, consider NAR-MBR decoding to achieve a critical balance between speed and accuracy. This framework offers faster processing than traditional autoregressive methods while effectively mitigating the performance degradation typically associated with non-autoregressive decoding. You should explore its application for real-time or high-throughput Automatic Speech Recognition (ASR) tasks where both low latency and high fidelity are paramount.

Key insights

NAR-MBR decoding significantly improves non-autoregressive speech recognition accuracy while maintaining high speed.

Principles

Method

NAR-MBR maximizes expected utility from samples drawn from the NAR model's output probability, efficiently obtaining multiple samples with a single forward computation.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.