Non-Autoregressive Minimum Bayes' Risk Decoding for Fast Speech Recognition
Summary
A novel Non-Autoregressive Minimum Bayes' Risk (NAR-MBR) decoding framework is proposed to enhance speech recognition speed and accuracy. Non-autoregressive (NAR) decoding generates output tokens in parallel, offering faster processing than sequential autoregressive (AR) methods, but typically suffers from degraded recognition performance due to its inability to condition on previously generated tokens. NAR-MBR addresses this by maximizing expected utility calculated from samples drawn from the NAR model's output probability, rather than just maximizing the probability itself. This approach efficiently obtains multiple samples with a single forward computation. Experiments across LibriSpeech, Switchboard, AMI, and web presentation corpus demonstrated that NAR-MBR decoding outperformed previous NAR decoding methods and ran faster than AR decoding.
Key takeaway
For NLP Engineers and AI Scientists optimizing speech recognition systems, consider NAR-MBR decoding to achieve a critical balance between speed and accuracy. This framework offers faster processing than traditional autoregressive methods while effectively mitigating the performance degradation typically associated with non-autoregressive decoding. You should explore its application for real-time or high-throughput Automatic Speech Recognition (ASR) tasks where both low latency and high fidelity are paramount.
Key insights
NAR-MBR decoding significantly improves non-autoregressive speech recognition accuracy while maintaining high speed.
Principles
- NAR decoding offers speed but lacks context.
- MBR can resolve NAR model uncertainty.
- Parallel sampling boosts efficiency.
Method
NAR-MBR maximizes expected utility from samples drawn from the NAR model's output probability, efficiently obtaining multiple samples with a single forward computation.
In practice
- Implement NAR-MBR for faster ASR.
- Apply MBR to parallel generation tasks.
- Optimize sample generation in NAR models.
Topics
- Speech Recognition
- Non-Autoregressive Decoding
- Minimum Bayes' Risk
- Decoding Algorithms
- ASR Performance
- Parallel Processing
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.