Systematic Study of Dysarthric Speech Recognition: Spectral Features and Acoustic Models
Summary
A systematic study investigated various combinations of acoustic features and Acoustic Models to improve dysarthric speech recognition, a task challenged by significant acoustic variability from impaired articulation. The research, utilizing the TORGO database, found that incorporating Pitch features notably enhanced recognition performance, particularly for sentence recognition tasks. This comprehensive investigation demonstrated the potential to boost the Factorized Time Delay Neural Network (F-TDNN) model's effectiveness for dysarthric speech. The methods implemented with the F-TDNN model achieved a 4.65% relative improvement in isolated word recognition and a 4.63% relative improvement in sentence recognition compared to prior research. This performance gain is attributed to a deliberate selection of the number of overlapping frames between consecutive training example chunks, effectively compensating for speech variability.
Key takeaway
For Machine Learning Engineers developing speech recognition systems for dysarthric or other challenging speech, you should prioritize integrating Pitch features into your acoustic models. This approach, particularly with F-TDNN architectures, can yield significant performance gains, as demonstrated by a 4.65% relative improvement in isolated word recognition. Furthermore, carefully selecting the number of overlapping frames between training chunks is crucial for effectively mitigating speech variability and enhancing overall system accuracy.
Key insights
Incorporating Pitch features and optimizing training frame overlap significantly improves dysarthric speech recognition with F-TDNN models.
Principles
- Dysarthric speech recognition benefits from Pitch features.
- Acoustic variability in dysarthric speech can be compensated.
- Careful selection of overlapping frames enhances model performance.
Method
Systematically investigate acoustic feature combinations with Acoustic Models, specifically incorporating Pitch features and optimizing overlapping frames for F-TDNN on dysarthric speech databases.
In practice
- Integrate Pitch features into dysarthric speech models.
- Optimize overlapping frames for F-TDNN training.
- Utilize TORGO database for dysarthric speech research.
Topics
- Dysarthric Speech Recognition
- Acoustic Features
- Pitch Features
- F-TDNN
- Speech Variability
- TORGO Database
Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.