TxPert: using multiple knowledge graphs for prediction of transcriptomic perturbation effects
Summary
TxPert is a new deep learning method designed to predict transcriptomic perturbation effects, crucial for understanding disease mechanisms and developing therapies. This latent-transfer-based model integrates multiple knowledge graphs (KGs) of gene-gene relationships, including biological databases like STRING and GO, and proprietary high-throughput perturbation screens (PxMap, TxMap). TxPert achieves performance comparable to split-half experimental reproducibility for single unseen perturbations and improves Pearson Δ by 8–25% over existing methods for double unseen perturbations and single perturbations in different cell lines. The framework also introduces improved benchmarking practices, such as batch-appropriate control matching and retrieval metrics, addressing limitations of previous deep learning models in this domain that often underperformed simpler baselines. Its robust performance across various out-of-distribution tasks demonstrates the value of combining diverse biological knowledge graphs.
Key takeaway
For AI Scientists and Machine Learning Engineers developing computational models for drug discovery, TxPert offers a robust framework for predicting transcriptomic perturbation effects in out-of-distribution scenarios. You should consider integrating multiple biological knowledge graphs and adopting rigorous benchmarking practices, including batch-matched controls and retrieval metrics, to improve model accuracy and generalizability. This approach can accelerate the design of effective therapeutic interventions by reducing the need for exhaustive wet lab screening.
Key insights
Combining multiple biological knowledge graphs significantly enhances transcriptomic perturbation prediction in out-of-distribution settings.
Principles
- Batch-matched controls are essential for accurate biological data modeling.
- Retrieval metrics and Pearson Δ effectively assess perturbation-specific signal.
- Integrating diverse KGs provides complementary information for improved prediction.
Method
TxPert uses a latent-transfer deep learning architecture with a basal state encoder and a perturbation encoder, leveraging graph neural networks (GNNs) on multiple KGs to predict log-transformed gene expression profiles.
In practice
- Use batch-matched controls to mitigate experimental batch effects.
- Employ Pearson Δ and retrieval metrics for robust model evaluation.
- Integrate multiple biological KGs (e.g., STRING, GO, PxMap, TxMap) for enhanced prediction.
Topics
- Transcriptomic Perturbation Prediction
- Knowledge Graphs
- Graph Neural Networks
- Out-of-Distribution Generalization
- Drug Discovery
Code references
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.