Automatic Speech Recognition for Child Reading: A Phonemic Approach using Isolated Words in Brazilian Portuguese
Summary
A new methodology improves Automatic Speech Recognition (ASR) models for assessing reading decoding in Brazilian Portuguese children. Researchers fine-tuned Wav2Vec2.0 models, transforming the transcription paradigm from orthographic to phonemic. Utilizing a novel corpus of 5,400 isolated word audio samples from 2nd and 3rd-grade elementary school children, the study compared pre-trained Portuguese and multilingual models. The phonemic approach, combined with fine-tuning, data augmentation, and adapted tokenization, significantly reduced the Phoneme Error Rate (PER). This advancement addresses data scarcity and high speech variability challenges, validating ASR's utility for detailed diagnosis of decoding errors and phonological acquisition, surpassing commercial tool limitations.
Key takeaway
For research scientists developing tools for child literacy assessment, you should consider adopting a phonemic transcription paradigm when fine-tuning ASR models. This approach, combined with data augmentation and adapted tokenization, offers a robust method to overcome the challenges of child speech variability and data scarcity, enabling more accurate diagnosis of reading decoding errors than current commercial solutions.
Key insights
A phonemic approach with fine-tuned Wav2Vec2.0 significantly improves ASR for child reading assessment in Brazilian Portuguese.
Principles
- Phonemic transcription reduces ASR error rates.
- Fine-tuning pre-trained models enhances performance.
Method
The methodology involves fine-tuning Wav2Vec2.0 models, transforming orthographic to phonemic transcription, using data augmentation, and adapting tokenization for child speech.
In practice
- Use phonemic transcription for child ASR.
- Apply data augmentation to child speech datasets.
Topics
- Automatic Speech Recognition
- Child Reading Assessment
- Wav2Vec2.0 Fine-tuning
- Phonemic Transcription
- Brazilian Portuguese
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.