Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new ontology memory-augmented framework is proposed for automatic speech recognition (ASR) correction in long text-speech interleaved conversations. This framework addresses the challenge of locating sparse correction evidence within lengthy, noisy dialogue histories by organizing preceding interaction history into a dynamically updatable ontology memory. This memory stores entities, terminology, surface variants, potential ASR confusions, and semantic relations as retrievable nodes for context-grounded correction. To evaluate this approach, the RAMC-Corr dataset was constructed from MagicData-RAMC, designed for long-range ASR correction with grounded context. Experiments on RAMC-Corr demonstrated that the proposed method improves over direct correction in 9 out of 10 paired backbone-setting combinations, achieving significant C-CER reductions, for instance, from 35.66% to 29.04% with Qwen2.5-14B under zero-shot prompting. The code and dataset are publicly available.

Key takeaway

If you are a Machine Learning Engineer improving ASR in long, interleaved conversations, traditional methods often struggle. Implementing an ontology memory-augmented framework can significantly boost accuracy. This approach grounds corrections in dynamically updated, structured dialogue evidence, reducing harmful over-editing common with direct LLM rewriting. You should explore integrating such a memory mechanism to enhance context-dependent ASR error resolution.

Key insights

Ontology memory significantly improves ASR correction in long, interleaved conversations by providing structured, dynamically updated contextual evidence.

Principles

Long-range ASR correction benefits from structured, dynamic context.
Explicit memory reduces over-editing and grounds corrections.
Raw dialogue history is insufficient for long-context ASR correction.

Method

The framework extracts knowledge from history into a conversation-level ontology memory, which is dynamically updated. For speech segments, a correction model retrieves relevant ontology evidence for context-grounded refinement.

In practice

Construct dynamic ontology memory from dialogue history.
Use retrieved ontology evidence to ground ASR corrections.
Evaluate ASR correction with RAMC-Corr dataset.

Topics

Automatic Speech Recognition
ASR Correction
Ontology Memory
Conversational AI
Long-Context Processing
RAMC-Corr Dataset
Large Language Models

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.