Modeling Parkinson's Disease Progression Using Longitudinal Voice Biomarkers: A Comparative Study of Statistical and Neural Mixed-Effects Models

2026-04-20 · Source: stat.ML updates on arXiv.org · Field: Science & Research — Artificial Intelligence & Machine Learning, Data Science & Analytics, Health & Medical Research · Depth: Expert, extended

Summary

A comparative study benchmarked traditional statistical models against advanced neural network mixed-effects models for predicting Parkinson's Disease (PD) progression using longitudinal voice biomarkers. The research utilized the Oxford Parkinson's telemonitoring voice dataset, comprising 5,875 records from 42 early-stage PD patients, to predict Total UPDRS scores. Models evaluated included Linear Mixed Models (LMMs), Generalized Additive Mixed Models (GAMMs), Generalized Neural Network Mixed Models (GNMMs) by Mandel et al. (2023), and Neural Mixed Effects (NME) models by Wörtwein et al. (2023). Contrary to expectations, GAMMs achieved the best predictive performance with the lowest Mean Squared Error (MSE = 6.56) and Mean Absolute Error (MAE = 2.00) on a held-out test set, outperforming all neural network models, which exhibited significantly higher errors (MSEs exceeding 96). This suggests that for datasets with many observations but a modest number of predictors, simpler models with explicit structure for within-subject correlation can be more effective.

Key takeaway

For AI Scientists and Machine Learning Engineers developing predictive models for longitudinal health data, your focus should remain on robust statistical methods like Generalized Additive Mixed Models (GAMMs) when dealing with limited sample sizes and a modest number of predictors. Do not assume that more complex neural network architectures will automatically yield superior results; instead, prioritize models that explicitly handle within-subject correlations and offer clear interpretability. Consider integrating automatic variable selection and robust noise handling for practical telemedicine applications.

Key insights

Traditional statistical models, particularly GAMMs, outperformed complex neural networks in predicting Parkinson's progression from voice data.

Principles

Explicitly modeling within-subject correlation is crucial for longitudinal data.
Simpler models can excel with limited predictors and strong regularization.
Nonlinear temporal effects are important for disease progression modeling.

Method

The study compared LMMs, GAMMs, GNMMs, and NME models for predicting Total UPDRS from voice features, using a hold-out test set of each subject's last observation.

In practice

Use GAMMs for longitudinal disease progression modeling.
Prioritize variable selection in neural mixed-effects models.
Consider log-transforming skewed outcome variables for normality.

Topics

Parkinson's Disease Progression
Voice Biomarkers
Longitudinal Data Analysis
Generalized Additive Mixed Models
Neural Mixed-Effects Models

Code references

RanTongUTD/Parkinson-Prediction

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.