InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, AI in Mental Health · Depth: Expert, long

Summary

InfoShield is a novel framework designed to create privacy-preserving speech representations for mental health screening, specifically targeting depression detection. It addresses the critical barrier of user privacy concerns regarding demographic information exposure, which often deters clinical adoption. The framework minimizes mutual information between speech representations and sensitive attributes like gender and age, while simultaneously preserving diagnostic accuracy. A key innovation is TimeAwareMINE, which employs cross-modal attention to overcome temporal-static misalignment issues in sequential speech, leading to more reliable mutual information estimates. Experiments on the Androids Corpus, comprising 118 Italian speakers, demonstrated InfoShield's effectiveness. It reduced gender inference accuracy from 92.6% to 55.5% and age inference from 55.7% to 30.3%, achieving an F1-score of 0.784 for depression classification. This represents only a 6% utility loss compared to an oracle and significantly outperforms prior state-of-the-art (F1=0.723) and Differential Privacy baselines.

Key takeaway

For AI Security Engineers or Research Scientists developing speech-based mental health screening tools, InfoShield offers a validated approach to mitigate critical privacy risks. If you are concerned about demographic information leakage from voice data, you should consider implementing its information-theoretic optimization with TimeAwareMINE. This framework significantly reduces attribute inference while preserving diagnostic accuracy, providing a more robust privacy-utility balance than traditional Differential Privacy or adversarial methods for clinical deployment.

Key insights

InfoShield balances speech-based mental health screening utility with privacy by minimizing mutual information between representations and sensitive attributes.

Principles

Information-theoretic optimization can selectively remove sensitive attributes from speech representations.
Cross-modal attention is crucial for accurate mutual information estimation in sequential speech.
Targeted MI minimization offers superior privacy-utility balance over global noise injection.

Method

InfoShield optimizes a loss function combining depression prediction utility, Variational Information Bottleneck (VIB) compression, and TimeAwareMINE-based mutual information minimization. TimeAwareMINE uses cross-modal attention to align acoustic frames with attribute embeddings.

In practice

Apply TimeAwareMINE's cross-modal attention to align sequential speech with static attributes for privacy.
Integrate VIB compression with targeted mutual information minimization for robust privacy-utility in speech systems.

Topics

Mental Health Screening
Speech Privacy
Information-Theoretic Optimization
TimeAwareMINE
Attribute Inference
Variational Information Bottleneck

Best for: AI Scientist, AI Security Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.