Benchmarking large language models for cell-free RNA diagnostic biomarker discovery

· Source: Machine learning : nature.com subject feeds · Field: Health & Wellbeing — Artificial Intelligence & Machine Learning, Health & Medical Research, Life Sciences & Biology · Depth: Expert, short

Summary

A study benchmarked six large language models from OpenAI, Anthropic, and Google for diagnostic biomarker discovery using plasma cell-free RNA datasets. The evaluation spanned three clinical cohorts: Kawasaki disease versus multisystem inflammatory syndrome in children, active tuberculosis versus symptomatic respiratory controls, and myalgic encephalomyelitis/chronic fatigue syndrome versus sedentary controls. Models were assessed on literature-guided gene panel nomination for downstream machine learning and autonomous end-to-end classifier construction from raw count matrices. Despite prompt adherence issues, model-nominated panels effectively recapitulated canonical immune pathways and consistently outperformed random panels, even matching differential gene expression baselines in the tuberculosis cohort. End-to-end automation proved feasible but demonstrated model- and task-dependent performance, with one model approaching conventional levels for Kawasaki disease but decreasing for tuberculosis and ME/CFS.

Key takeaway

For Research Scientists developing diagnostic tools, this study indicates that large language models can significantly aid biomarker discovery. You should consider integrating LLMs for literature-guided gene panel nomination, as this approach consistently outperformed random selections and matched baselines in some contexts. While end-to-end automation is possible, carefully evaluate specific models and tasks, as performance varies. Prioritize LLM applications where robust panel identification is critical, and be prepared for task-dependent outcomes in fully automated classification pipelines.

Key insights

Large language models show promise for diagnostic biomarker discovery, particularly in gene panel nomination.

Principles

Method

Benchmarking six LLMs on plasma cell-free RNA datasets across three clinical cohorts, evaluating literature-guided gene panel nomination and autonomous end-to-end classifier construction.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.