Foundations of Molecular Generation with GP-MoLFormer on AMD Instinct MI300X Accelerators

· Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computational Chemistry · Depth: Intermediate, long

Summary

The article introduces GP-MoLFormer, an open-source generative foundation model from IBM designed for molecular generation, trained on over a billion canonical SMILES strings. It is based on the MoLFormer architecture, utilizing a decoder-only transformer with causal language modeling, linear attention, and Rotary Positional Embeddings (RoPE) to learn molecular syntax and patterns. Unlike MatterGen, which focuses on 3D crystalline structures, GP-MoLFormer operates on symbolic molecular representations, treating generation as a sequence completion problem. The model supports unconditional generation, scaffold-constrained decoration, and property-oriented design via pair-tuning for properties like QED, penalized logP, and DRD2 binding affinity. Evaluation uses the MOSES benchmark, assessing fragment and scaffold similarity, diversity (IntDivp), global distributional similarity (FCD), validity, uniqueness, and novelty. The article also provides a practical guide for setting up and running GP-MoLFormer on AMD Instinct MI300X accelerators using Docker and the AMD Container Toolkit.

Key takeaway

For AI Engineers and Research Scientists working on drug discovery or materials science, GP-MoLFormer offers a robust, open-source solution for molecular generation. You should consider integrating this sequence-based model for exploring chemical space, especially when designing molecules with specific properties via pair-tuning, leveraging its compatibility with AMD GPUs for efficient execution.

Key insights

GP-MoLFormer enables efficient, large-scale molecular generation using sequence-based language models on AMD GPUs.

Principles

Method

GP-MoLFormer uses a decoder-only transformer with causal language modeling on SMILES strings. It supports unconditional, scaffold-constrained, and pair-tuned property-oriented generation by learning prompt vectors to bias chemical space exploration.

In practice

Topics

Code references

Best for: AI Engineer, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.