A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Creative Industries & Arts · Depth: Expert, quick

Summary

The MusiCorpus dataset, comprising 1,309 pages of historical and primarily handwritten sheet music, has been released to advance Optical Music Recognition (OMR). This dataset addresses a critical gap in the field, as previous deep learning advancements in OMR were hampered by the lack of suitable training data reflecting realistic conditions found in memory institutions like libraries, museums, and archives. MusiCorpus includes MusicXML transcriptions and symbol annotations, making it the largest dataset of handwritten music to date. It is designed to facilitate the training and evaluation of both end-to-end and object detection-based OMR systems, enabling direct performance comparisons.

Key takeaway

For Computer Vision Engineers developing Optical Music Recognition systems, MusiCorpus offers an unprecedented resource. You should integrate this dataset into your training and evaluation pipelines, especially if your work involves historical or handwritten scores. This will allow you to develop more robust models and accurately benchmark their performance against realistic musical heritage collections, overcoming previous data scarcity challenges.

Key insights

MusiCorpus provides the largest dataset of historical handwritten music for Optical Music Recognition.

Principles

In practice

Topics

Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.