‘Virtual cells’ aim to turn raw data into predictive models of biology

2026-06-02 · Source: Machine learning : nature.com subject feeds · Field: Science & Research — Life Sciences & Biology, Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Expert, medium

Summary

Researchers are developing "virtual cells," computational models designed to simulate cellular environments and predict responses to external triggers, aiming to accelerate hypothesis generation for disease understanding and therapeutic intervention. While early "virtual cell 1.0" models used differential equations, the artificial intelligence revolution has significantly advanced the field. Teams are now utilizing vast transcriptomic and other molecular datasets, such as the Arc Institute's scBaseCount with approximately 0.5 billion cells and Xaira Therapeutics' Pisces with 25.6 million cells, to build deep-learning foundation models. Notable examples include Stack2, which predicted drug effects across 28 human tissues, and X-Cell, which predicted immune T cell activation mechanisms. Despite challenges like noisy single-cell RNA sequencing data and the current focus on simpler cell lines, new tools like Systema and the State model are improving perturbation prediction accuracy, with State achieving a 33% accuracy rate compared to 7% for conventional methods.

Key takeaway

For research scientists developing predictive biological models, you should prioritize integrating diverse, causal perturbation datasets with deep-learning foundation models. Focus on employing tools like Systema or the State model to filter noise and identify perturbation-specific effects, as this significantly improves prediction accuracy over conventional statistical methods. This approach will enable you to move beyond static cell states and develop more robust simulations for therapeutic discovery, such as identifying novel drug targets or understanding disease progression.

Key insights

Virtual cells utilize AI and vast datasets to simulate cellular behavior for disease and drug discovery.

Principles

Cells are complex but highly structured systems.
AI excels at exploring combinatorial biological space.
Causal data is crucial for building causal models.

Method

Build deep-learning foundation models using large-scale transcriptomic and perturbation datasets, employing tools like scBaseCount for data collection and Systema/State for noise reduction and perturbation-specific analysis.

In practice

Model cancer biology using PhysiCell framework.
Predict drug treatment effects across human tissues.
Identify putative T-cell inactivators.

Topics

Virtual Cells
Computational Biology
AI Foundation Models
Transcriptomics
Perturbation Analysis
Drug Discovery

Code references

ArcInstitute/arc-virtual-cell-atlas

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.