Move over, AlphaFold: open-source model predicts shape of 1 billion proteins
Summary
The Chan Zuckerberg Initiative's Biohub has unveiled the ESM Atlas, a new artificial-intelligence tool that has generated an atlas of over one billion predicted protein structures and billions more protein sequences. This database significantly surpasses Google DeepMind's AlphaFold database by more than 800 million entries and a previous ESM Atlas by 300 million. The predictions were made using ESMFold2, an open-source AI model that Biohub claims outperforms AlphaFold3, the latest version of Google DeepMind's system, in protein-structure prediction. ESMFold2 is based on a 'protein language' model trained on billions of proteins, including metagenomic sequences. Researchers demonstrated its capability by designing new antibodies and proteins that bind to targets implicated in cancers and immunological conditions, with a high success rate in lab tests. The atlas, containing 1.1 billion predicted structures and 6.8 billion protein sequences, aims to facilitate discovery by connecting known and unknown parts of the protein universe, exemplified by finding structural similarities between CRISPR proteins and a 2023-identified gene-editing protein.
Key takeaway
For research scientists and machine learning engineers focused on protein engineering or drug discovery, the open-source ESMFold2 and its vast ESM Atlas offer a powerful new resource. You should explore this atlas of 1.1 billion predicted protein structures to accelerate novel protein design and identify structural similarities. This tool provides a competitive edge over proprietary solutions, enabling faster hypothesis generation and experimental validation for therapeutic development.
Key insights
ESMFold2, an open-source AI, predicts over a billion protein structures, outperforming AlphaFold3 and enabling novel protein design.
Principles
- Protein language models excel at structure prediction.
- Metagenomic data expands protein universe understanding.
- Open-source AI fosters broad scientific discovery.
Method
ESMFold2, a protein language model trained on billions of proteins, predicts 3D structures and interaction complexes. It was used to design novel antibodies and proteins for specific binding targets.
In practice
- Access the ESM Atlas for 1.1 billion predicted structures.
- Use ESMFold2 to design targeted antibody molecules.
- Explore metagenomic sequences for novel protein functions.
Topics
- Protein Structure Prediction
- ESMFold2
- ESM Atlas
- AI Models
- Antibody Design
- Metagenomics
- Open-Source AI
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.