Knowledge Graphs and Reasoning LLMs for Finding Simple Yet Effective Transcriptomic Perturbation Predictors

2026-06-07 · Source: Machine Learning · Field: Science & Research — Life Sciences & Biology, Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Expert, quick

Summary

A new study demonstrates that predicting transcriptomic gene expression changes from unseen gene knockout perturbations can be effectively addressed using simple models combined with biological knowledge graphs. Researchers found that a K-nearest neighbour (KNN) approach, leveraging knowledge graph assumptions, achieves highly competitive performance, outperforming almost all other methods on out-of-distribution perturbation prediction. Furthermore, this performance can be enhanced by integrating Large Language Models (LLMs) optimized through reinforcement learning (RL) to refine the KNN neighbourhood. This RL-trained LLM approach achieves performance equivalent to leading methods on cell lines from Replogle et al. (2022). The RL training also improved the LLM's ability in downstream differential expression prediction, despite not being directly trained for it. These findings underscore the utility of knowledge graphs as model priors and suggest RL's potential in developing generalizable LLMs for complex biological response prediction.

Key takeaway

For research scientists developing virtual cell models or predicting gene expression changes, you should prioritize integrating biological knowledge graphs as model priors. This approach, even with simple K-nearest neighbour methods, offers highly competitive out-of-distribution prediction. Furthermore, consider fine-tuning Large Language Models with reinforcement learning to refine these predictions, as this method achieves performance comparable to leading approaches and improves downstream differential expression tasks. This strategy can enhance the generalizability and accuracy of your biological response predictions.

Key insights

Simple K-nearest neighbour models leveraging knowledge graphs, enhanced by RL-trained LLMs, offer competitive transcriptomic perturbation prediction.

Principles

Knowledge graphs serve as effective model priors.
RL refines LLMs for generalizable biological prediction.
Simple models can achieve competitive performance.

Method

A K-nearest neighbour model uses a knowledge graph to find similar perturbations. An RL-optimized LLM then refines this neighbourhood to improve predictive performance on transcriptomic gene expression.

In practice

Apply knowledge graphs for out-of-distribution prediction.
Fine-tune LLMs with RL for biological response tasks.
Evaluate K-nearest neighbour as a strong baseline for perturbation prediction.

Topics

Knowledge Graphs
Large Language Models
Reinforcement Learning
Transcriptomics
Gene Expression Prediction
K-nearest Neighbour

Best for: AI Scientist, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.