Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations
Summary
A new ontology memory-augmented framework is proposed for automatic speech recognition (ASR) correction in long text-speech interleaved conversations. This framework addresses the challenge of locating sparse correction evidence within lengthy, noisy dialogue histories by organizing preceding interaction history into a dynamically updatable ontology memory. This memory stores entities, terminology, surface variants, potential ASR confusions, and semantic relations as retrievable nodes for context-grounded correction. To evaluate this approach, the RAMC-Corr dataset was constructed from MagicData-RAMC, designed for long-range ASR correction with grounded context. Experiments on RAMC-Corr demonstrated that the proposed method improves over direct correction in 9 out of 10 paired backbone-setting combinations, achieving significant C-CER reductions, for instance, from 35.66% to 29.04% with Qwen2.5-14B under zero-shot prompting. The code and dataset are publicly available.
Key takeaway
If you are a Machine Learning Engineer improving ASR in long, interleaved conversations, traditional methods often struggle. Implementing an ontology memory-augmented framework can significantly boost accuracy. This approach grounds corrections in dynamically updated, structured dialogue evidence, reducing harmful over-editing common with direct LLM rewriting. You should explore integrating such a memory mechanism to enhance context-dependent ASR error resolution.
Key insights
Ontology memory significantly improves ASR correction in long, interleaved conversations by providing structured, dynamically updated contextual evidence.
Principles
- Long-range ASR correction benefits from structured, dynamic context.
- Explicit memory reduces over-editing and grounds corrections.
- Raw dialogue history is insufficient for long-context ASR correction.
Method
The framework extracts knowledge from history into a conversation-level ontology memory, which is dynamically updated. For speech segments, a correction model retrieves relevant ontology evidence for context-grounded refinement.
In practice
- Construct dynamic ontology memory from dialogue history.
- Use retrieved ontology evidence to ground ASR corrections.
- Evaluate ASR correction with RAMC-Corr dataset.
Topics
- Automatic Speech Recognition
- ASR Correction
- Ontology Memory
- Conversational AI
- Long-Context Processing
- RAMC-Corr Dataset
- Large Language Models
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.