InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization
Summary
InfoShield is a novel framework designed to create privacy-preserving speech representations for mental health screening, specifically targeting depression detection. It addresses the critical barrier of user privacy concerns regarding demographic information exposure, which often deters clinical adoption. The framework minimizes mutual information between speech representations and sensitive attributes like gender and age, while simultaneously preserving diagnostic accuracy. A key innovation is TimeAwareMINE, which employs cross-modal attention to overcome temporal-static misalignment issues in sequential speech, leading to more reliable mutual information estimates. Experiments on the Androids Corpus, comprising 118 Italian speakers, demonstrated InfoShield's effectiveness. It reduced gender inference accuracy from 92.6% to 55.5% and age inference from 55.7% to 30.3%, achieving an F1-score of 0.784 for depression classification. This represents only a 6% utility loss compared to an oracle and significantly outperforms prior state-of-the-art (F1=0.723) and Differential Privacy baselines.
Key takeaway
For AI Security Engineers or Research Scientists developing speech-based mental health screening tools, InfoShield offers a validated approach to mitigate critical privacy risks. If you are concerned about demographic information leakage from voice data, you should consider implementing its information-theoretic optimization with TimeAwareMINE. This framework significantly reduces attribute inference while preserving diagnostic accuracy, providing a more robust privacy-utility balance than traditional Differential Privacy or adversarial methods for clinical deployment.
Key insights
InfoShield balances speech-based mental health screening utility with privacy by minimizing mutual information between representations and sensitive attributes.
Principles
- Information-theoretic optimization can selectively remove sensitive attributes from speech representations.
- Cross-modal attention is crucial for accurate mutual information estimation in sequential speech.
- Targeted MI minimization offers superior privacy-utility balance over global noise injection.
Method
InfoShield optimizes a loss function combining depression prediction utility, Variational Information Bottleneck (VIB) compression, and TimeAwareMINE-based mutual information minimization. TimeAwareMINE uses cross-modal attention to align acoustic frames with attribute embeddings.
In practice
- Apply TimeAwareMINE's cross-modal attention to align sequential speech with static attributes for privacy.
- Integrate VIB compression with targeted mutual information minimization for robust privacy-utility in speech systems.
Topics
- Mental Health Screening
- Speech Privacy
- Information-Theoretic Optimization
- TimeAwareMINE
- Attribute Inference
- Variational Information Bottleneck
Best for: AI Scientist, AI Security Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.