TxPert: using multiple knowledge graphs for prediction of transcriptomic perturbation effects

2026-05-01 · Source: Machine learning : nature.com subject feeds · Field: Science & Research — Life Sciences & Biology, Mathematics & Computational Sciences, Health & Medical Research · Depth: Expert, extended

Summary

TxPert is a new deep learning method designed to predict transcriptomic perturbation effects, crucial for understanding disease mechanisms and developing therapies. This latent-transfer-based model integrates multiple knowledge graphs (KGs) of gene-gene relationships, including biological databases like STRING and GO, and proprietary high-throughput perturbation screens (PxMap, TxMap). TxPert achieves performance comparable to split-half experimental reproducibility for single unseen perturbations and improves Pearson Δ by 8–25% over existing methods for double unseen perturbations and single perturbations in different cell lines. The framework also introduces improved benchmarking practices, such as batch-appropriate control matching and retrieval metrics, addressing limitations of previous deep learning models in this domain that often underperformed simpler baselines. Its robust performance across various out-of-distribution tasks demonstrates the value of combining diverse biological knowledge graphs.

Key takeaway

For AI Scientists and Machine Learning Engineers developing computational models for drug discovery, TxPert offers a robust framework for predicting transcriptomic perturbation effects in out-of-distribution scenarios. You should consider integrating multiple biological knowledge graphs and adopting rigorous benchmarking practices, including batch-matched controls and retrieval metrics, to improve model accuracy and generalizability. This approach can accelerate the design of effective therapeutic interventions by reducing the need for exhaustive wet lab screening.

Key insights

Combining multiple biological knowledge graphs significantly enhances transcriptomic perturbation prediction in out-of-distribution settings.

Principles

Batch-matched controls are essential for accurate biological data modeling.
Retrieval metrics and Pearson Δ effectively assess perturbation-specific signal.
Integrating diverse KGs provides complementary information for improved prediction.

Method

TxPert uses a latent-transfer deep learning architecture with a basal state encoder and a perturbation encoder, leveraging graph neural networks (GNNs) on multiple KGs to predict log-transformed gene expression profiles.

In practice

Use batch-matched controls to mitigate experimental batch effects.
Employ Pearson Δ and retrieval metrics for robust model evaluation.
Integrate multiple biological KGs (e.g., STRING, GO, PxMap, TxMap) for enhanced prediction.

Topics

Transcriptomic Perturbation Prediction
Knowledge Graphs
Graph Neural Networks
Out-of-Distribution Generalization
Drug Discovery

Code references

valence-labs/TxPert

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.