Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching
Summary
A new structure-aware, template-free retrosynthesis framework, RetroDiT, significantly improves performance by encoding the two-stage nature of chemical reactions as a positional inductive bias. This method places reaction center atoms at the sequence head in neural representations, transforming implicit chemical knowledge into explicit positional patterns. RetroDiT, a graph transformer with rotary position embeddings, leverages this ordering to prioritize chemically critical regions. Combined with discrete flow matching, it decouples training from sampling, enabling generation in 20-50 steps compared to 500 for prior diffusion methods. The approach achieves state-of-the-art top-1 accuracy of 61.2% on USPTO-50k and 51.3% on USPTO-Full with predicted reaction centers, outperforming larger foundation models with significantly less training data. Ablation studies confirm that structural priors are more effective than brute-force model scaling.
Key takeaway
For AI Researchers developing retrosynthesis models, this work demonstrates that explicitly encoding chemical structure and reaction centers into neural representations dramatically improves performance and efficiency. Your models can achieve state-of-the-art results with significantly fewer parameters and less data by leveraging positional inductive biases, rather than relying solely on brute-force scaling or extensive reaction libraries. Consider integrating structure-aware encoding to optimize your model architectures.
Key insights
Atom ordering in neural representations can encode chemical knowledge, improving retrosynthesis model efficiency and accuracy.
Principles
- Positional inductive bias enhances chemical reaction modeling.
- Structural priors outperform brute-force model scaling.
Method
RetroDiT uses a graph transformer with rotary position embeddings, placing reaction center atoms at the sequence head, combined with discrete flow matching for efficient retrosynthesis generation.
In practice
- Prioritize reaction center atoms in sequence representations.
- Employ discrete flow matching for faster generation.
Topics
- Retrosynthesis
- Discrete Flow Matching
- Graph Transformers
- Positional Embeddings
- Reaction Center Prediction
Best for: AI Researcher, AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.