Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

2026-06-16 · Source: Computation and Language · Field: Health & Wellbeing — Artificial Intelligence & Machine Learning, Mental Health & Psychological Support, Clinical Care & Medical Practice · Depth: Expert, quick

Summary

A study investigated open-weight Large Language Models (LLMs) for assessing dementia and depression severity from speech samples. Researchers utilized interviews with 154 German-speaking subjects, introducing an observer-based Global Depression Scale (GDS-D) aligned with the Global Deterioration Scale (GDS). Three LLMs—Mistral 3.1, DeepHermes, and Qwen3—were evaluated in zero-shot prediction and LLM-based feature extraction for Support Vector Regression, using both human and pause-enriched transcripts. Results showed LLMs effectively predict depression severity in zero-shot settings, achieving a best Mean Absolute Error (MAE) of 0.60. Dementia assessment significantly improved with structured feature extraction, reaching a best MAE of 0.78 and reducing errors by up to 35% compared to zero-shot baselines. The competitive performance of pause-enriched transcripts suggests the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

Key takeaway

For Machine Learning Engineers developing diagnostic tools, you should consider integrating open-weight LLMs for neuropsychiatric assessment. Implement structured feature extraction with LLMs like Mistral 3.1 for dementia severity prediction, as it significantly reduces errors. For depression, zero-shot LLM approaches are highly effective. Furthermore, explore using pause-enriched transcripts to build fully automatic, cost-efficient screening pipelines, enhancing accessibility and scalability of early diagnosis.

Key insights

LLMs can effectively assess neuropsychiatric disorder severity from speech, enabling automated screening.

Principles

Differential diagnosis benefits from structured LLM features.
Pause-enriched transcripts are viable for automated assessment.
Zero-shot LLMs excel in depression severity prediction.

Method

The method involves comparing zero-shot LLM predictions with LLM-based feature extraction for Support Vector Regression, using human and pause-enriched speech transcripts from standardized interviews.

In practice

Implement LLM feature extraction for dementia assessment.
Utilize pause-enriched transcripts for automated screening.
Apply zero-shot LLMs for initial depression screening.

Topics

Large Language Models
Dementia Assessment
Depression Assessment
Neuropsychiatric Disorders
Speech Analysis
Zero-shot Learning

Best for: NLP Engineer, AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.