Move over, AlphaFold: open-source model predicts shape of 1 billion proteins

· Source: Machine learning : nature.com subject feeds · Field: Science & Research — Artificial Intelligence & Machine Learning, Life Sciences & Biology, Health & Medical Research · Depth: Expert, quick

Summary

The Chan Zuckerberg Initiative's Biohub has unveiled the ESM Atlas, a new artificial-intelligence tool that has generated an atlas of over one billion predicted protein structures and billions more protein sequences. This database significantly surpasses Google DeepMind's AlphaFold database by more than 800 million entries and a previous ESM Atlas by 300 million. The predictions were made using ESMFold2, an open-source AI model that Biohub claims outperforms AlphaFold3, the latest version of Google DeepMind's system, in protein-structure prediction. ESMFold2 is based on a 'protein language' model trained on billions of proteins, including metagenomic sequences. Researchers demonstrated its capability by designing new antibodies and proteins that bind to targets implicated in cancers and immunological conditions, with a high success rate in lab tests. The atlas, containing 1.1 billion predicted structures and 6.8 billion protein sequences, aims to facilitate discovery by connecting known and unknown parts of the protein universe, exemplified by finding structural similarities between CRISPR proteins and a 2023-identified gene-editing protein.

Key takeaway

For research scientists and machine learning engineers focused on protein engineering or drug discovery, the open-source ESMFold2 and its vast ESM Atlas offer a powerful new resource. You should explore this atlas of 1.1 billion predicted protein structures to accelerate novel protein design and identify structural similarities. This tool provides a competitive edge over proprietary solutions, enabling faster hypothesis generation and experimental validation for therapeutic development.

Key insights

ESMFold2, an open-source AI, predicts over a billion protein structures, outperforming AlphaFold3 and enabling novel protein design.

Principles

Method

ESMFold2, a protein language model trained on billions of proteins, predicts 3D structures and interaction complexes. It was used to design novel antibodies and proteins for specific binding targets.

In practice

Topics

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.