Discovery of candidate therapeutic targets with Geneformer

· Source: Machine learning : nature.com subject feeds · Field: Science & Research — Life Sciences & Biology, Health & Medical Research · Depth: Intermediate, medium

Summary

Geneformer is a foundational artificial intelligence model designed to identify gene networks disrupted in disease, particularly in data-limited settings. Pretrained on over 100 million single-cell transcriptomes, it enables context-aware predictions in network biology. The methodology involves tokenizing raw gene expression counts into rank value encodings, assessing phenotype separability with zero-shot embeddings, and then fine-tuning. Fine-tuning can be single-task, such as disease prediction within a specific cell type, or multi-task to learn cross-informative features like cell types and disease states. Performance is evaluated using confusion matrices, macro F1 scores, and embedding analysis. The protocol also supports in silico perturbation to simulate gene repression or activation, quantifying cell state shifts to prioritize therapeutic targets. This process, including an option for quantized models for efficiency, typically completes in under 2 days on a standard GPU workstation and requires moderate Python experience.

Key takeaway

For AI Scientists and Research Scientists working on therapeutic target discovery with limited single-cell data, Geneformer offers a robust, efficient solution. Its ability to perform context-aware predictions and in silico perturbations can accelerate the identification of disease-disrupting gene networks. You should explore its fine-tuning capabilities for specific disease contexts and leverage its computational efficiency for rapid prototyping.

Key insights

Geneformer leverages a transformer model and transfer learning for gene network analysis and therapeutic target discovery with limited data.

Principles

Method

The Geneformer protocol involves tokenization of gene expression, zero-shot embedding assessment, fine-tuning (single or multi-task), and in silico perturbation to identify therapeutic targets.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.