AF2BIND: predicting small-molecule binding sites using the pair representation of AlphaFold2
Summary
AF2BIND is a novel logistic regression model designed for accurate de novo prediction of small-molecule binding sites in proteins, leveraging features from the pretrained AlphaFold2 (AF2) neural network. Unlike traditional methods, AF2BIND operates without relying on homology modeling, multiple sequence alignments, or prior knowledge of a pocket-compatible ligand. It achieves this by using AF2's internal pair representation, augmented with 20 "bait" amino acids supplied as individual chains to tease out ligand-binding signals. The model demonstrates a 66% binding-residue recovery rate and a 0.936 ROC AUC, outperforming other single-representation features like ESM2 and ESM1-IF. AF2BIND has been applied to the human proteome, identifying over 20,000 binding sites, including thousands previously unassigned by homology-based methods or P2Rank, many of which are shallow or surface-exposed and potentially druggable.
Key takeaway
For AI Scientists and Research Scientists focused on drug discovery, AF2BIND offers a powerful, interpretable tool for identifying novel small-molecule binding sites. You should integrate AF2BIND into your early-stage drug discovery pipelines to uncover de novo ligandable sites, especially in proteins where traditional homology-based methods or pocket finders fall short, potentially accelerating the identification of new therapeutic targets.
Key insights
AlphaFold2's internal representations can be repurposed to accurately predict de novo small-molecule binding sites.
Principles
- Pretrained neural network features are transferable to orthogonal tasks.
- Protein-amino acid contacts can approximate protein-small molecule interactions.
- Logistic regression models offer interpretability for feature contributions.
Method
AF2BIND uses AF2's pair representation, augmented with 20 "bait" amino acids, as input to a logistic regression model. It predicts the probability of each residue contacting a small-molecule ligand, without requiring MSAs or ligand knowledge.
In practice
- Use AF2BIND to identify novel binding sites in disease-relevant proteins.
- Apply bait activation analysis to infer ligand chemical properties.
- Combine AF2BIND with co-structure predictors to guide docking.
Topics
- Small-molecule Binding Site Prediction
- AlphaFold2
- Drug Discovery
- Protein Ligandability
- Proteome Analysis
Code references
Best for: AI Scientist, Research Scientist, AI Researcher, AI Data Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.