Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews
Summary
A study investigated open-weight Large Language Models (LLMs) for assessing dementia and depression severity from speech samples. Researchers utilized interviews with 154 German-speaking subjects, introducing an observer-based Global Depression Scale (GDS-D) aligned with the Global Deterioration Scale (GDS). Three LLMs—Mistral 3.1, DeepHermes, and Qwen3—were evaluated in zero-shot prediction and LLM-based feature extraction for Support Vector Regression, using both human and pause-enriched transcripts. Results showed LLMs effectively predict depression severity in zero-shot settings, achieving a best Mean Absolute Error (MAE) of 0.60. Dementia assessment significantly improved with structured feature extraction, reaching a best MAE of 0.78 and reducing errors by up to 35% compared to zero-shot baselines. The competitive performance of pause-enriched transcripts suggests the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.
Key takeaway
For Machine Learning Engineers developing diagnostic tools, you should consider integrating open-weight LLMs for neuropsychiatric assessment. Implement structured feature extraction with LLMs like Mistral 3.1 for dementia severity prediction, as it significantly reduces errors. For depression, zero-shot LLM approaches are highly effective. Furthermore, explore using pause-enriched transcripts to build fully automatic, cost-efficient screening pipelines, enhancing accessibility and scalability of early diagnosis.
Key insights
LLMs can effectively assess neuropsychiatric disorder severity from speech, enabling automated screening.
Principles
- Differential diagnosis benefits from structured LLM features.
- Pause-enriched transcripts are viable for automated assessment.
- Zero-shot LLMs excel in depression severity prediction.
Method
The method involves comparing zero-shot LLM predictions with LLM-based feature extraction for Support Vector Regression, using human and pause-enriched speech transcripts from standardized interviews.
In practice
- Implement LLM feature extraction for dementia assessment.
- Utilize pause-enriched transcripts for automated screening.
- Apply zero-shot LLMs for initial depression screening.
Topics
- Large Language Models
- Dementia Assessment
- Depression Assessment
- Neuropsychiatric Disorders
- Speech Analysis
- Zero-shot Learning
Best for: NLP Engineer, AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.