Feature selection leads to divergent neurobiological interpretations of brain-based machine learning biomarkers

· Source: Machine learning : nature.com subject feeds · Field: Science & Research — Life Sciences & Biology, Health & Medical Research, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A study involving over 12,000 participants across four large-scale neuroimaging datasets (HBN, ABCD, HCPD, PNC) and 13 outcomes demonstrates that univariate feature selection in brain-based machine learning models can lead to incomplete and potentially misleading neurobiological interpretations. Researchers found that features typically discarded by selection methods can achieve significant prediction accuracies, often comparable to those of top-ranked features, across cognitive, developmental, and psychiatric phenotypes. These results hold for both functional connectivity (fMRI) and structural (diffusion tensor imaging) connectomes and are robust in external validation. The findings suggest that focusing solely on the most prominent features oversimplifies the complex, widely distributed neural circuits underlying brain-behavior associations, potentially contributing to reproducibility issues in the field. The study reinforces the importance of considering subtle, brain-wide signals.

Key takeaway

For AI Scientists and Research Scientists developing brain-based predictive models, you should critically re-evaluate reliance on univariate feature selection. Your models may be overlooking significant, complementary neurobiological signals that offer comparable predictive power and could reveal distinct patient subtypes. Consider exploring lower-ranked feature sets to gain a more comprehensive understanding of brain-behavior associations and to identify novel, anatomically accessible targets for intervention, thereby improving model generalizability and clinical utility.

Key insights

Discarded brain features can predict phenotypes with accuracy comparable to top-ranked features, yielding divergent neurobiological interpretations.

Principles

Method

A decile-based feature ranking paradigm was used, partitioning connectome features into ten non-overlapping subsets based on their association strength with a target phenotype, then evaluating each subset's predictive accuracy.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.