Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features
Summary
A study involving twenty-four subjects investigated the ability of Speech Articulatory Coding (SPARC) features to linearly predict surface electromyography (sEMG) envelopes across aloud, mimed, and subvocal speech. Utilizing elastic-net multivariate temporal response function (mTRF) with sentence-level cross-validation, SPARC features consistently demonstrated higher prediction accuracy compared to phoneme one-hot representations across most electrodes and all speech modes. Aloud and mimed speech exhibited comparable performance, while subvocal speech maintained above-chance detection of articulatory activity. Variance partitioning revealed a significant unique contribution from SPARC and a minimal unique contribution from phoneme features. The mTRF weight patterns provided anatomically interpretable relationships between electrode sites and articulatory movements, which remained consistent across different speech modes. This research supports SPARC as a robust and interpretable intermediate target for sEMG-based silent-speech modeling.
Key takeaway
For Machine Learning Engineers developing silent-speech interfaces, this research indicates that Speech Articulatory Coding (SPARC) features are a more robust and interpretable intermediate target than phoneme representations. You should prioritize integrating SPARC features into your sEMG-based models to enhance prediction accuracy and gain clearer insights into articulatory activity, even in subvocal speech scenarios.
Key insights
SPARC features predict sEMG better than phonemes across speech modes, supporting silent-speech modeling.
Principles
- SPARC features offer superior sEMG prediction.
- Articulatory activity is detectable in subvocal speech.
- mTRF weights reveal interpretable anatomical links.
Method
Elastic-net multivariate temporal response function (mTRF) with sentence-level cross-validation was used to predict sEMG envelopes from SPARC and phoneme features.
In practice
- Use SPARC for sEMG-based silent-speech modeling.
- Consider mTRF for analyzing neural-muscle relationships.
Topics
- Speech Articulatory Coding
- Surface Electromyography
- Silent Speech Modeling
- Multivariate Temporal Response Function
- Speech Encoding Accuracy
Best for: NLP Engineer, AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.