Retrieval-Augmented Self-Taught Reasoning Model with Adaptive Chain-of-Thought for ASR Named Entity Correction
Summary
A new retrieval-augmented self-taught reasoning model with adaptive chain-of-thought (RASTAR) has been developed to correct named entity errors in Automatic Speech Recognition (ASR) systems. This framework addresses the common issue of ASR misrecognizing domain-specific phrases, which can lead to significant downstream failures. RASTAR comprises two main components: a rephrasing language model (RLM) for robust named entity recognition (NER) that uses contextual semantic understanding, and an adaptive self-taught reasoning model (A-STAR) that dynamically adjusts its reasoning depth based on task difficulty. Experiments on the AISHELL-1 and Homophone datasets demonstrated RASTAR's effectiveness, achieving relative reductions in named entity character error rate (NE-CER) of 17.96% and 34.42%, respectively, compared to a DANCER baseline. The model also showed improved reasoning efficiency, particularly for larger models like Qwen3-8B, which reduced token usage by 30% and 21% on the respective datasets.
Key takeaway
For NLP engineers and research scientists working on ASR error correction, RASTAR offers a significant advancement in handling named entity misrecognitions. You should consider integrating its rephrasing language model for more robust NER and its adaptive chain-of-thought mechanism to improve correction accuracy while optimizing computational resources, especially for larger LLMs. This approach can lead to substantial reductions in named entity error rates, particularly in phonetically challenging scenarios.
Key insights
RASTAR improves ASR named entity correction by combining robust NER with adaptive, self-taught reasoning.
Principles
- Contextual semantic understanding enhances NER robustness.
- Adaptive reasoning depth reduces computational overhead.
- Self-training with preference optimization improves model performance.
Method
RASTAR uses a rephrasing language model for NER, followed by phonetic-level candidate retrieval. An adaptive self-taught reasoning model (A-STAR) then classifies problem difficulty to dynamically adjust chain-of-thought depth, trained via DPO on self-distilled data.
In practice
- Use RLM for NER in noisy ASR transcripts.
- Implement adaptive CoT to optimize LLM inference costs.
- Apply DPO for self-training reasoning models.
Topics
- Automatic Speech Recognition
- Named Entity Correction
- Retrieval-Augmented Generation
- Adaptive Chain-of-Thought
- Self-Taught Reasoning
Best for: NLP Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.