Systematic Study of Dysarthric Speech Recognition: Spectral Features and Acoustic Models

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A systematic study investigated various combinations of acoustic features and Acoustic Models to improve dysarthric speech recognition, a task challenged by significant acoustic variability from impaired articulation. The research, utilizing the TORGO database, found that incorporating Pitch features notably enhanced recognition performance, particularly for sentence recognition tasks. This comprehensive investigation demonstrated the potential to boost the Factorized Time Delay Neural Network (F-TDNN) model's effectiveness for dysarthric speech. The methods implemented with the F-TDNN model achieved a 4.65% relative improvement in isolated word recognition and a 4.63% relative improvement in sentence recognition compared to prior research. This performance gain is attributed to a deliberate selection of the number of overlapping frames between consecutive training example chunks, effectively compensating for speech variability.

Key takeaway

For Machine Learning Engineers developing speech recognition systems for dysarthric or other challenging speech, you should prioritize integrating Pitch features into your acoustic models. This approach, particularly with F-TDNN architectures, can yield significant performance gains, as demonstrated by a 4.65% relative improvement in isolated word recognition. Furthermore, carefully selecting the number of overlapping frames between training chunks is crucial for effectively mitigating speech variability and enhancing overall system accuracy.

Key insights

Incorporating Pitch features and optimizing training frame overlap significantly improves dysarthric speech recognition with F-TDNN models.

Principles

Method

Systematically investigate acoustic feature combinations with Acoustic Models, specifically incorporating Pitch features and optimizing overlapping frames for F-TDNN on dysarthric speech databases.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.