Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations
Summary
An ontology memory-augmented ASR correction framework is proposed to address challenges in long text-speech interleaved conversations. Traditional ASR correction struggles with sparse evidence and noise in extended interactions. This new framework organizes preceding interaction history into a dynamically updatable ontology memory, storing entities, terminology, surface variants, potential ASR confusions, and semantic relations as retrievable nodes. To evaluate this approach, the RAMC-Corr dataset, derived from MAGIC-RAMC, was constructed for long-range ASR correction with grounded context. Experiments on RAMC-Corr demonstrate that the method improves over direct correction in 9 out of 10 paired backbone-setting combinations, fostering more selective and evidence-grounded corrections for context-dependent ASR errors.
Key takeaway
For NLP engineers or AI scientists developing conversational AI systems with long, text-speech interleaved interactions, consider integrating an ontology memory-augmented ASR correction approach. This method directly addresses the challenge of sparse contextual evidence, improving ASR accuracy by providing structured, retrievable conversation history. Evaluate its benefits for your specific dialogue systems to achieve more precise and context-aware speech recognition.
Key insights
An ontology memory framework enhances ASR correction by structuring conversation history for context-grounded evidence in long, interleaved dialogues.
Principles
- ASR correction needs conversation-level context.
- Ontology memory organizes history effectively.
- Structured context improves error correction.
Method
The framework organizes interaction history into a dynamically updatable ontology memory, storing entities, terminology, variants, confusions, and semantic relations as retrievable nodes for context-grounded correction.
In practice
- Apply structured memory to long dialogues.
- Improve ASR accuracy in mixed-modality systems.
- Enable selective, evidence-grounded corrections.
Topics
- ASR Correction
- Ontology Memory
- Conversational AI
- Long-range Context
- Text-Speech Interleaved
- RAMC-Corr Dataset
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.