A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A new calculus-based framework has been developed to formally estimate the optimal vocabulary size for end-to-end Automatic Speech Recognition (ASR) systems. Unlike hybrid ASR systems where vocabulary size is clearly defined by linguistic units, end-to-end systems derive tokens from training text corpora, making vocabulary size a critical hyper-parameter. Existing tokenization algorithms like BPE, WordPiece, and ULM require vocabulary size as an input, yet its determination often lacks formal justification in literature and toolkits like ESPNet. This framework builds on prior work by employing curve fitting on training data and applying first and second derivative tests from calculus. The approach was validated using the standard Librispeech corpus, demonstrating that an optimally chosen vocabulary size significantly enhances ASR performance.

Key takeaway

For AI Engineers and Research Scientists optimizing end-to-end ASR systems, applying this calculus-based framework can formally determine the optimal vocabulary size. This approach, validated on the Librispeech corpus, directly improves ASR performance by moving beyond arbitrary hyper-parameter selection. Consider integrating derivative tests into your tokenization pipeline to achieve more robust and efficient model training.

Key insights

A calculus-based framework formally determines optimal vocabulary size for end-to-end ASR systems.

Principles

Method

The method involves curve fitting training data and applying first and second derivative tests from calculus to formally estimate the optimal vocabulary size hyper-parameter for end-to-end ASR.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.