DDA-BERT: end-to-end training for data-dependent acquisition mass spectrometry-based proteomics

· Source: Machine learning : nature.com subject feeds · Field: Science & Research — Life Sciences & Biology, Artificial Intelligence & Machine Learning · Depth: Expert, short

Summary

DDA-BERT is a new transformer-based, end-to-end deep learning model designed for peptide-spectrum match (PSM) rescoring in data-dependent acquisition (DDA)-based proteomics. The model was trained on approximately 271 million PSMs from 11 different species. DDA-BERT significantly improves peptide identification rates, showing increases of 2.24%–269.35% on human, 3.73%–141.46% on yeast, 5.53%–45.64% on *Drosophila*, and 3.68%–62.77% on *Arabidopsis* datasets compared to existing tools. It also enhances peptide identifications by 4.14%–87.47% in HLA immunopeptidomics data and maintains high sensitivity in trace-level proteomics samples. The primary limitations include its requirement for GPU-based computing and substantial, diverse training datasets for optimal performance.

Key takeaway

For proteomics researchers seeking to improve peptide identification accuracy and sensitivity, DDA-BERT offers a robust, AI-driven solution. You should consider integrating this transformer-based model into your DDA workflows, especially for human, yeast, *Drosophila*, *Arabidopsis*, and HLA immunopeptidomics data. Be prepared for the computational demands, as it requires GPU-based computing and substantial training data for peak performance.

Key insights

DDA-BERT is a transformer-based model that significantly improves peptide identification in DDA proteomics through end-to-end deep learning.

Principles

Method

DDA-BERT employs a transformer-based architecture for end-to-end deep learning, trained on large, diverse PSM datasets to directly refine peptide-spectrum match ranking and confidence estimation.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.