Orthrus: toward evolutionary and functional RNA foundation models
Summary
Orthrus is a novel Mamba-based mature RNA foundation model designed to predict key RNA properties and functions by leveraging biological domain knowledge. Unlike existing models that adapt textual domain strategies, Orthrus uses a self-supervised contrastive learning objective with biological augmentations. It maximizes embedding similarity between splice isoforms from ten model organisms and orthologous genes across over 400 mammalian species. This training approach creates latent representations that cluster RNA sequences based on functional and evolutionary similarities. Orthrus's mature RNA isoform representations demonstrate superior performance on mRNA property prediction tasks compared to other genomic foundation models, requiring significantly less fine-tuning data. The model also effectively captures the divergent biological functions of individual transcript isoforms, with its code and pretrained models publicly available on GitHub, Zenodo, and Hugging Face.
Key takeaway
For AI Scientists and Machine Learning Engineers developing genomic foundation models, Orthrus demonstrates that incorporating biological domain knowledge through contrastive learning significantly improves performance on RNA property prediction. You should consider adopting similar biologically-informed pretraining strategies to enhance model accuracy and reduce fine-tuning data requirements, especially when working with complex biological sequences like RNA.
Key insights
Orthrus is a Mamba-based RNA foundation model using contrastive learning and biological augmentations for superior RNA property prediction.
Principles
- Biological domain knowledge enhances RNA foundation models.
- Contrastive learning can cluster RNA by function and evolution.
- Mamba architecture is effective for RNA sequence modeling.
Method
Orthrus is pretrained using a self-supervised contrastive learning objective. It maximizes embedding similarity between splice isoforms from ten model organisms and orthologous genes from 400+ mammalian species, incorporating biological augmentations.
In practice
- Utilize Orthrus for mRNA property prediction tasks.
- Explore Orthrus embeddings for functional RNA clustering.
- Access pretrained models on Hugging Face for inference.
Topics
- RNA Foundation Models
- Mamba Architecture
- Contrastive Learning
- Transcript Isoform Function
- mRNA Property Prediction
Code references
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.