Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new ontology memory-augmented framework is proposed for automatic speech recognition (ASR) correction in long text-speech interleaved conversations. This framework addresses the challenge of locating sparse correction evidence within lengthy, noisy dialogue histories by organizing preceding interaction history into a dynamically updatable ontology memory. This memory stores entities, terminology, surface variants, potential ASR confusions, and semantic relations as retrievable nodes for context-grounded correction. To evaluate this approach, the RAMC-Corr dataset was constructed from MagicData-RAMC, designed for long-range ASR correction with grounded context. Experiments on RAMC-Corr demonstrated that the proposed method improves over direct correction in 9 out of 10 paired backbone-setting combinations, achieving significant C-CER reductions, for instance, from 35.66% to 29.04% with Qwen2.5-14B under zero-shot prompting. The code and dataset are publicly available.

Key takeaway

If you are a Machine Learning Engineer improving ASR in long, interleaved conversations, traditional methods often struggle. Implementing an ontology memory-augmented framework can significantly boost accuracy. This approach grounds corrections in dynamically updated, structured dialogue evidence, reducing harmful over-editing common with direct LLM rewriting. You should explore integrating such a memory mechanism to enhance context-dependent ASR error resolution.

Key insights

Ontology memory significantly improves ASR correction in long, interleaved conversations by providing structured, dynamically updated contextual evidence.

Principles

Method

The framework extracts knowledge from history into a conversation-level ontology memory, which is dynamically updated. For speech segments, a correction model retrieves relevant ontology evidence for context-grounded refinement.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.