Segment-Level Mandarin Chinese Speech-Based Cognitive Impairment Detection via an Autoencoder with Contrastive Learning
Summary
A new segment-level representation learning framework has been developed for speech-based cognitive impairment detection, specifically targeting Mandarin Chinese. This framework addresses challenges like limited labeled data and cross-dataset variability by dividing speech recordings into 5-second segments and converting them into spectrograms. It integrates a GRU-based autoencoder with supervised contrastive learning and combines offline and online spectrogram augmentation strategies. Experiments on four independent Mandarin Chinese speech datasets demonstrated stable and competitive performance, achieving overall accuracy exceeding 96%. The highest accuracy of 98.61% was obtained on the Ye dataset, and 96.83% on the NCMMSC2021 dataset for the more challenging three-class classification. Ablation studies confirmed that supervised contrastive learning is crucial for performance, with offline augmentation providing complementary benefits.
Key takeaway
For AI Scientists developing speech-based diagnostic tools, this framework offers a robust approach for cognitive impairment detection, especially in low-resource Mandarin Chinese contexts. You should consider segment-level modeling with GRU autoencoders and supervised contrastive learning. Integrating both offline and online spectrogram augmentation will significantly enhance model stability and discriminative power, improving accuracy in challenging multi-class scenarios.
Key insights
Segment-level speech representation with autoencoders and contrastive learning improves cognitive impairment detection in low-resource settings.
Principles
- Combine reconstruction and contrastive objectives.
- Augment spectrograms both offline and online.
- Segment speech to increase training data.
Method
The framework segments speech into 5-second log-Mel spectrograms, uses a GRU autoencoder for reconstruction, and applies supervised contrastive learning with combined offline/online spectrogram augmentation to enhance discriminative latent representations.
In practice
- Apply 5-second segmentation for low-resource speech.
- Use SpecAugment for both offline and online data views.
- Implement GRU autoencoders with supervised contrastive loss.
Topics
- Cognitive Impairment Detection
- Speech Processing
- Autoencoders
- Contrastive Learning
- Spectrogram Augmentation
- Mandarin Chinese
Best for: NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.