Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features

2026-04-20 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study involving twenty-four subjects investigated the ability of Speech Articulatory Coding (SPARC) features to linearly predict surface electromyography (sEMG) envelopes across aloud, mimed, and subvocal speech. Utilizing elastic-net multivariate temporal response function (mTRF) with sentence-level cross-validation, SPARC features consistently demonstrated higher prediction accuracy compared to phoneme one-hot representations across most electrodes and all speech modes. Aloud and mimed speech exhibited comparable performance, while subvocal speech maintained above-chance detection of articulatory activity. Variance partitioning revealed a significant unique contribution from SPARC and a minimal unique contribution from phoneme features. The mTRF weight patterns provided anatomically interpretable relationships between electrode sites and articulatory movements, which remained consistent across different speech modes. This research supports SPARC as a robust and interpretable intermediate target for sEMG-based silent-speech modeling.

Key takeaway

For Machine Learning Engineers developing silent-speech interfaces, this research indicates that Speech Articulatory Coding (SPARC) features are a more robust and interpretable intermediate target than phoneme representations. You should prioritize integrating SPARC features into your sEMG-based models to enhance prediction accuracy and gain clearer insights into articulatory activity, even in subvocal speech scenarios.

Key insights

SPARC features predict sEMG better than phonemes across speech modes, supporting silent-speech modeling.

Principles

SPARC features offer superior sEMG prediction.
Articulatory activity is detectable in subvocal speech.
mTRF weights reveal interpretable anatomical links.

Method

Elastic-net multivariate temporal response function (mTRF) with sentence-level cross-validation was used to predict sEMG envelopes from SPARC and phoneme features.

In practice

Use SPARC for sEMG-based silent-speech modeling.
Consider mTRF for analyzing neural-muscle relationships.

Topics

Speech Articulatory Coding
Surface Electromyography
Silent Speech Modeling
Multivariate Temporal Response Function
Speech Encoding Accuracy

Best for: NLP Engineer, AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.