scLong: a billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics

· Source: Machine learning : nature.com subject feeds · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Computational Biology · Depth: Expert, long

Summary

scLong is a new billion-parameter foundation model designed for single-cell RNA sequencing (scRNA-seq) data analysis, pretrained on 48 million cells. Published in Nature Communications on February 5, 2026, it addresses limitations of existing models by performing self-attention across all 28,000 genes in the human genome. This capability allows scLong to capture long-range dependencies, including those involving lowly expressed genes often overlooked by other models, which are critical for cellular processes. Additionally, scLong integrates external gene knowledge from the Gene Ontology using a graph convolutional network to enhance its understanding of gene functions. Extensive evaluations demonstrate that scLong outperforms both current state-of-the-art scRNA-seq foundation models and task-specific models across various applications, such as predicting transcriptional responses to perturbations, forecasting cancer drug responses, and inferring gene regulatory networks.

Key takeaway

For AI Researchers and Computational Biologists working with single-cell transcriptomics, scLong offers a powerful new tool to overcome limitations of previous foundation models. Its ability to analyze all 28,000 human genes and integrate external biological knowledge means your research can uncover more comprehensive insights into cellular processes, especially those driven by lowly expressed genes. Consider adopting scLong for more accurate predictions in drug response and gene regulatory network inference.

Key insights

scLong is a billion-parameter foundation model for scRNA-seq, capturing long-range gene dependencies and integrating external gene knowledge.

Principles

Method

scLong employs self-attention across 28,000 human genes to capture long-range dependencies and integrates Gene Ontology knowledge via a graph convolutional network for enriched contextual understanding.

In practice

Topics

Code references

Best for: AI Researcher, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.