Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computational Chemistry · Depth: Expert, quick

Summary

A new structure-aware, template-free retrosynthesis framework, RetroDiT, significantly improves performance by encoding the two-stage nature of chemical reactions as a positional inductive bias. This method places reaction center atoms at the sequence head in neural representations, transforming implicit chemical knowledge into explicit positional patterns. RetroDiT, a graph transformer with rotary position embeddings, leverages this ordering to prioritize chemically critical regions. Combined with discrete flow matching, it decouples training from sampling, enabling generation in 20-50 steps compared to 500 for prior diffusion methods. The approach achieves state-of-the-art top-1 accuracy of 61.2% on USPTO-50k and 51.3% on USPTO-Full with predicted reaction centers, outperforming larger foundation models with significantly less training data. Ablation studies confirm that structural priors are more effective than brute-force model scaling.

Key takeaway

For AI Researchers developing retrosynthesis models, this work demonstrates that explicitly encoding chemical structure and reaction centers into neural representations dramatically improves performance and efficiency. Your models can achieve state-of-the-art results with significantly fewer parameters and less data by leveraging positional inductive biases, rather than relying solely on brute-force scaling or extensive reaction libraries. Consider integrating structure-aware encoding to optimize your model architectures.

Key insights

Atom ordering in neural representations can encode chemical knowledge, improving retrosynthesis model efficiency and accuracy.

Principles

Method

RetroDiT uses a graph transformer with rotary position embeddings, placing reaction center atoms at the sequence head, combined with discrete flow matching for efficient retrosynthesis generation.

In practice

Topics

Best for: AI Researcher, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.