Measurement noise limits the advantage of nonlinear models over linear models in biomedical prediction
Summary
A new analysis challenges the common assumption that flexible models underperform linear models on biomedical tabular data due to model or data limitations. Instead, measurement noise is identified as the primary constraint, blurring the population-optimal predictor and erasing nonlinear structure faster than linear structure. Specifically, a degree-$k$ interaction's contribution to excess risk is attenuated by the $k$-th power of feature reliability ($ ho^k$), while linear components are attenuated only by $ ho$. This differential attenuation means that at typical biomedical measurement reliabilities (e.g., 0.5 for noisier features), the potential advantage of flexible models can vanish, even if the underlying biology is strongly nonlinear. The study, which assembled classical results from epidemiology, psychometrics, and Gaussian analysis into an exact excess-risk identity, found that across 140 UK Biobank tasks, only 20 showed a measurable performance gap, and in 19 of these, injecting noise preferentially reduced the nonlinear advantage. Modalities like resting-state functional connectivity (reliability 0.2-0.3) showed no gap, reinforcing that flexible models succeed only when feature reliability, representation, and sample size align.
Key takeaway
For Machine Learning Engineers developing predictive models for biomedical tabular data, if your flexible models (e.g., deep networks, gradient-boosted trees) fail to outperform linear regression, do not immediately assume the biology is linear or that your model or data is insufficient. Instead, investigate feature measurement reliability as a binding constraint. Prioritize improving measurement quality or feature engineering for higher reliability, as this is often more impactful than increasing sample size or model complexity. You should report feature test-retest reliability alongside sample size and dimension to provide crucial context for model performance.
Key insights
Measurement noise, not model inadequacy, often limits flexible models' advantage over linear models in biomedical prediction.
Principles
- Nonlinear structure attenuates by reliability to the $k$-th power ($ ho^k$).
- Linear structure attenuates only by reliability ($ ho$).
- Noise-hidden nonlinearity cannot be recovered by more data or flexible models.
Method
Classical results from regression dilution, reliability theory, and Gaussian analysis are assembled into an exact excess-risk identity for the nonlinear advantage.
In practice
- Evaluate feature reliability before deploying complex models.
- Intervene (add noise/data/representation) to diagnose performance ties.
Topics
- Biomedical Prediction
- Measurement Noise
- Linear Models
- Nonlinear Models
- Feature Reliability
- UK Biobank
- Machine Learning Theory
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.