Customized Amazon Nova models improve molecular-property prediction in drug discovery
Summary
Amazon's Generative AI Innovation Center, in collaboration with Nimbus Therapeutics, has developed a customized Amazon Nova large language model (LLM) that significantly improves molecular-property prediction in drug discovery. This single, fine-tuned LLM unifies the prediction of 11 critical molecular properties across lipophilicity, permeability, and clearance, a task traditionally requiring multiple specialized graph neural networks (GNNs). The approach utilizes supervised fine tuning (SFT) and reinforcement fine tuning (RFT) on Nova 2 Lite, achieving accuracy comparable to or exceeding GNNs on 7 of 11 properties, with an average RMSE only 5% higher than baseline GNNs. This solution streamlines the drug discovery workflow, reduces operational complexity, and enables conversational AI capabilities for medicinal chemists, potentially accelerating drug development which currently takes 10-15 years and costs over $2 billion per drug.
Key takeaway
For medicinal chemists and biotech teams evaluating molecular properties, this advancement means you can consolidate multiple GNN-based prediction tasks into a single, interactive LLM. This not only simplifies your workflow and reduces infrastructure overhead but also enables conversational interaction for reasoning and molecular modification suggestions, accelerating early-stage drug design and increasing viable candidate throughput.
Key insights
Fine-tuned LLMs can unify complex molecular property prediction, outperforming or matching specialized GNNs.
Principles
- Foundational knowledge precedes performance optimization.
- Huber loss provides stable, effective RFT rewards.
Method
Customize a general-purpose LLM (Nova 2 Lite) using supervised fine tuning (SFT) on 55,000+ molecules, followed by reinforcement fine tuning (RFT) with Huber loss-based rewards.
In practice
- Use Nova Forge for LLM customization on SageMaker.
- Employ SFT for domain-specific knowledge acquisition.
- Apply RFT for predictive judgment and error minimization.
Topics
- Customized LLMs
- Drug Discovery
- Molecular Property Prediction
- Graph Neural Networks
- Supervised Fine Tuning
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Amazon Science homepage.