‘Virtual cells’ aim to turn raw data into predictive models of biology
Summary
Researchers are developing "virtual cells," computational models designed to simulate cellular environments and predict responses to external triggers, aiming to accelerate hypothesis generation for disease understanding and therapeutic intervention. While early "virtual cell 1.0" models used differential equations, the artificial intelligence revolution has significantly advanced the field. Teams are now utilizing vast transcriptomic and other molecular datasets, such as the Arc Institute's scBaseCount with approximately 0.5 billion cells and Xaira Therapeutics' Pisces with 25.6 million cells, to build deep-learning foundation models. Notable examples include Stack2, which predicted drug effects across 28 human tissues, and X-Cell, which predicted immune T cell activation mechanisms. Despite challenges like noisy single-cell RNA sequencing data and the current focus on simpler cell lines, new tools like Systema and the State model are improving perturbation prediction accuracy, with State achieving a 33% accuracy rate compared to 7% for conventional methods.
Key takeaway
For research scientists developing predictive biological models, you should prioritize integrating diverse, causal perturbation datasets with deep-learning foundation models. Focus on employing tools like Systema or the State model to filter noise and identify perturbation-specific effects, as this significantly improves prediction accuracy over conventional statistical methods. This approach will enable you to move beyond static cell states and develop more robust simulations for therapeutic discovery, such as identifying novel drug targets or understanding disease progression.
Key insights
Virtual cells utilize AI and vast datasets to simulate cellular behavior for disease and drug discovery.
Principles
- Cells are complex but highly structured systems.
- AI excels at exploring combinatorial biological space.
- Causal data is crucial for building causal models.
Method
Build deep-learning foundation models using large-scale transcriptomic and perturbation datasets, employing tools like scBaseCount for data collection and Systema/State for noise reduction and perturbation-specific analysis.
In practice
- Model cancer biology using PhysiCell framework.
- Predict drug treatment effects across human tissues.
- Identify putative T-cell inactivators.
Topics
- Virtual Cells
- Computational Biology
- AI Foundation Models
- Transcriptomics
- Perturbation Analysis
- Drug Discovery
Code references
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.