Benchmarking large language models for cell-free RNA diagnostic biomarker discovery

2026-06-11 · Source: Machine learning : nature.com subject feeds · Field: Health & Wellbeing — Artificial Intelligence & Machine Learning, Health & Medical Research, Life Sciences & Biology · Depth: Expert, short

Summary

A study benchmarked six large language models from OpenAI, Anthropic, and Google for diagnostic biomarker discovery using plasma cell-free RNA datasets. The evaluation spanned three clinical cohorts: Kawasaki disease versus multisystem inflammatory syndrome in children, active tuberculosis versus symptomatic respiratory controls, and myalgic encephalomyelitis/chronic fatigue syndrome versus sedentary controls. Models were assessed on literature-guided gene panel nomination for downstream machine learning and autonomous end-to-end classifier construction from raw count matrices. Despite prompt adherence issues, model-nominated panels effectively recapitulated canonical immune pathways and consistently outperformed random panels, even matching differential gene expression baselines in the tuberculosis cohort. End-to-end automation proved feasible but demonstrated model- and task-dependent performance, with one model approaching conventional levels for Kawasaki disease but decreasing for tuberculosis and ME/CFS.

Key takeaway

For Research Scientists developing diagnostic tools, this study indicates that large language models can significantly aid biomarker discovery. You should consider integrating LLMs for literature-guided gene panel nomination, as this approach consistently outperformed random selections and matched baselines in some contexts. While end-to-end automation is possible, carefully evaluate specific models and tasks, as performance varies. Prioritize LLM applications where robust panel identification is critical, and be prepared for task-dependent outcomes in fully automated classification pipelines.

Key insights

Large language models show promise for diagnostic biomarker discovery, particularly in gene panel nomination.

Principles

Model-nominated panels recapitulate canonical immune pathways.
End-to-end automation is feasible but model- and task-dependent.

Method

Benchmarking six LLMs on plasma cell-free RNA datasets across three clinical cohorts, evaluating literature-guided gene panel nomination and autonomous end-to-end classifier construction.

In practice

LLMs can nominate diagnostic gene panels for machine learning.
LLMs can construct end-to-end classifiers from raw count matrices.

Topics

Large Language Models
Biomarker Discovery
Cell-free RNA
Diagnostic Biomarkers
Omics Data
Machine Learning Benchmarking

Best for: AI Scientist, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.