Benchmarking large language models for cell-free RNA diagnostic biomarker discovery
Summary
A study benchmarked six large language models from OpenAI, Anthropic, and Google for diagnostic biomarker discovery using plasma cell-free RNA datasets. The evaluation spanned three clinical cohorts: Kawasaki disease versus multisystem inflammatory syndrome in children, active tuberculosis versus symptomatic respiratory controls, and myalgic encephalomyelitis/chronic fatigue syndrome versus sedentary controls. Models were assessed on literature-guided gene panel nomination for downstream machine learning and autonomous end-to-end classifier construction from raw count matrices. Despite prompt adherence issues, model-nominated panels effectively recapitulated canonical immune pathways and consistently outperformed random panels, even matching differential gene expression baselines in the tuberculosis cohort. End-to-end automation proved feasible but demonstrated model- and task-dependent performance, with one model approaching conventional levels for Kawasaki disease but decreasing for tuberculosis and ME/CFS.
Key takeaway
For Research Scientists developing diagnostic tools, this study indicates that large language models can significantly aid biomarker discovery. You should consider integrating LLMs for literature-guided gene panel nomination, as this approach consistently outperformed random selections and matched baselines in some contexts. While end-to-end automation is possible, carefully evaluate specific models and tasks, as performance varies. Prioritize LLM applications where robust panel identification is critical, and be prepared for task-dependent outcomes in fully automated classification pipelines.
Key insights
Large language models show promise for diagnostic biomarker discovery, particularly in gene panel nomination.
Principles
- Model-nominated panels recapitulate canonical immune pathways.
- End-to-end automation is feasible but model- and task-dependent.
Method
Benchmarking six LLMs on plasma cell-free RNA datasets across three clinical cohorts, evaluating literature-guided gene panel nomination and autonomous end-to-end classifier construction.
In practice
- LLMs can nominate diagnostic gene panels for machine learning.
- LLMs can construct end-to-end classifiers from raw count matrices.
Topics
- Large Language Models
- Biomarker Discovery
- Cell-free RNA
- Diagnostic Biomarkers
- Omics Data
- Machine Learning Benchmarking
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.