InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, AI in Mental Health · Depth: Expert, long

Summary

InfoShield is a novel framework designed to create privacy-preserving speech representations for mental health screening, specifically targeting depression detection. It addresses the critical barrier of user privacy concerns regarding demographic information exposure, which often deters clinical adoption. The framework minimizes mutual information between speech representations and sensitive attributes like gender and age, while simultaneously preserving diagnostic accuracy. A key innovation is TimeAwareMINE, which employs cross-modal attention to overcome temporal-static misalignment issues in sequential speech, leading to more reliable mutual information estimates. Experiments on the Androids Corpus, comprising 118 Italian speakers, demonstrated InfoShield's effectiveness. It reduced gender inference accuracy from 92.6% to 55.5% and age inference from 55.7% to 30.3%, achieving an F1-score of 0.784 for depression classification. This represents only a 6% utility loss compared to an oracle and significantly outperforms prior state-of-the-art (F1=0.723) and Differential Privacy baselines.

Key takeaway

For AI Security Engineers or Research Scientists developing speech-based mental health screening tools, InfoShield offers a validated approach to mitigate critical privacy risks. If you are concerned about demographic information leakage from voice data, you should consider implementing its information-theoretic optimization with TimeAwareMINE. This framework significantly reduces attribute inference while preserving diagnostic accuracy, providing a more robust privacy-utility balance than traditional Differential Privacy or adversarial methods for clinical deployment.

Key insights

InfoShield balances speech-based mental health screening utility with privacy by minimizing mutual information between representations and sensitive attributes.

Principles

Method

InfoShield optimizes a loss function combining depression prediction utility, Variational Information Bottleneck (VIB) compression, and TimeAwareMINE-based mutual information minimization. TimeAwareMINE uses cross-modal attention to align acoustic frames with attribute embeddings.

In practice

Topics

Best for: AI Scientist, AI Security Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.