Structure-informed deep generation enables de novo metabolite annotation in untargeted metabolomics

· Source: Machine learning : nature.com subject feeds · Field: Science & Research — Life Sciences & Biology, Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, long

Summary

MetGenX, a novel structure-informed encoder-decoder neural network, addresses the challenge of de novo metabolite annotation in untargeted metabolomics by generating metabolite structures directly from MS2 spectra. Published on April 20, 2026, MetGenX reformulates the spectrum-to-structure task as a structure-to-structure generation problem, leading to improved accuracy and chemical space coverage. In independent tests, it achieved a top-1 accuracy of 55.9% on 1388 NIST MS2 spectra and 68.5% on 1681 spectra from real biological samples, outperforming existing in silico tools. Its design ensures robust performance across both positive and negative ionization modes without requiring retraining. A multi-step annotation workflow using MetGenX successfully identified two previously uncharacterized metabolites in mouse liver untargeted metabolomics data, which were absent from major human metabolome databases.

Key takeaway

For metabolomics researchers struggling with identifying unknown metabolites, MetGenX offers a significant advancement by providing a highly accurate, structure-informed deep generation tool. You should consider integrating MetGenX into your untargeted metabolomics workflows to enhance de novo annotation capabilities and accelerate the discovery of novel chemical entities, especially for compounds not present in existing databases.

Key insights

MetGenX uses a structure-informed deep neural network to generate metabolite structures directly from MS2 spectra, improving annotation accuracy.

Principles

Method

MetGenX employs an encoder-decoder neural network to generate metabolite structures from MS2 spectra, leveraging a structure-informed approach to enhance accuracy and chemical space coverage.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.