Sample-efficient generative molecular design using memory manipulation
Summary
A novel framework named Saturn, which applies the Mamba architecture to generative molecular design, significantly improves sample efficiency for drug discovery. This approach addresses the critical challenge of directly optimizing molecules using high-fidelity, computationally expensive oracles like Density Functional Theory (DFT) simulations, a task current models are not efficient enough to handle. Saturn integrates "experience replay with data augmentation," a mechanism whose effect is intensified by the Mamba architecture, enabling it to outperform 16 other models on multiparameter optimization tasks. Crucially, Saturn demonstrates sufficient sample efficiency to directly optimize at the DFT fidelity level with an oracle budget of 500, promising enhanced generative design and improved hit rates in drug discovery. The datasets (ChEMBL 33, ZINC 250k), pretrained models, and codebase are publicly available for reproduction.
Key takeaway
The Saturn framework introduces a Mamba architecture with experience replay and data augmentation for generative molecular design, significantly improving sample efficiency and outperforming 16 models on multiparameter optimization tasks. This enables direct optimization using high-fidelity Density Functional Theory (DFT) simulations, making accurate but computationally expensive oracle evaluations practical for accelerating drug discovery hit rates.
Topics
- Generative Molecular Design
- Mamba Architecture
- Sample Efficiency
- Drug Discovery
- Density Functional Theory
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Nature Machine Intelligence.