A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation
Summary
The MusiCorpus dataset, comprising 1,309 pages of historical and primarily handwritten sheet music, has been released to advance Optical Music Recognition (OMR). This dataset addresses a critical gap in the field, as previous deep learning advancements in OMR were hampered by the lack of suitable training data reflecting realistic conditions found in memory institutions like libraries, museums, and archives. MusiCorpus includes MusicXML transcriptions and symbol annotations, making it the largest dataset of handwritten music to date. It is designed to facilitate the training and evaluation of both end-to-end and object detection-based OMR systems, enabling direct performance comparisons.
Key takeaway
For Computer Vision Engineers developing Optical Music Recognition systems, MusiCorpus offers an unprecedented resource. You should integrate this dataset into your training and evaluation pipelines, especially if your work involves historical or handwritten scores. This will allow you to develop more robust models and accurately benchmark their performance against realistic musical heritage collections, overcoming previous data scarcity challenges.
Key insights
MusiCorpus provides the largest dataset of historical handwritten music for Optical Music Recognition.
Principles
- Realistic data drives OMR progress
- Handwritten music recognition is critical
In practice
- Train end-to-end OMR systems
- Evaluate object detection OMR
- Compare OMR system performance
Topics
- Optical Music Recognition
- MusiCorpus Dataset
- Historical Music Scores
- Handwritten Music
- MusicXML Transcriptions
Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.