Scoring gene importance by interpreting single-cell foundation models
Summary
SIGnature is a novel framework designed to score gene importance in single-cell RNA-sequencing (scRNA-seq) data, addressing the inherent unreliability of absolute gene expression levels. This framework adapts attribution methods from explainable AI (XAI) and applies them to single-cell foundation models (FMs), quantifying each gene's influence on a cell's position within the model's latent space. The associated SIGnature package facilitates rapid gene set searches across extensive scRNA-seq atlases, capable of processing 22 million cells in minutes. The method effectively reduces technical noise, prioritizes regulatory genes like transcription factors, and enables robust cross-dataset comparisons. Its practical utility was demonstrated by identifying novel associations between the MS1 monocyte signature, known for its activation in severe COVID-19 and sepsis, and previously unassociated hyperinflammatory conditions such as Kawasaki disease (KD) across 400 studies. Experimental validation further confirmed that serum from KD patients can induce the MS1 phenotype in vitro.
Key takeaway
For research scientists analyzing single-cell RNA-sequencing data, adopting the SIGnature framework can significantly improve the identification of functionally important genes and robustly compare gene signatures across diverse datasets. You should consider integrating attribution-based scoring to overcome technical noise and uncover shared disease mechanisms, as demonstrated by its success in linking the MS1 signature to Kawasaki disease and other hyperinflammatory conditions. This approach offers faster, more reliable insights than traditional expression-based methods.
Key insights
SIGnature uses XAI attributions on scRNA-seq foundation models to robustly score gene importance and enable cross-dataset biological discovery.
Principles
- Attribution scores reduce technical noise.
- Attributions highlight regulatory genes.
- FMs enable cross-dataset comparisons.
Method
SIGnature selects a scRNA-seq FM, computes attributions using gradient-based XAI (e.g., Integrated Gradients, Input x Gradient, DeepLIFT) with a summation layer for multidimensional embeddings, then aggregates scores for gene set analysis.
In practice
- Query gene signatures across 22 million cells.
- Identify shared disease mechanisms.
- Generate testable biological hypotheses.
Topics
- Single-cell RNA-sequencing
- Foundation Models
- Explainable AI
- Gene Importance Scoring
- Cross-disease Analysis
- Kawasaki Disease
Code references
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.