BullingerDB: A Dataset for Handwritten Text Recognition and Writer Retrieval

2026-05-28 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Advanced, quick

Summary

BullingerDB is a new large-scale benchmark dataset designed for historical document analysis, focusing on handwritten text recognition (HTR) and writer retrieval. Derived from the correspondence of Heinrich Bullinger (1504-1575), the dataset encompasses 20,898 pages and 499,222 text lines from 796 writers over six decades. It features significant stylistic variation, multilingual content primarily in Latin and Early New High German, and includes meta-information like writer identity and time. Evaluations on BullingerDB show TrOCR achieving a Character Error Rate (CER) of 9.1% for text recognition. For writer retrieval, a new temporal nDCG metric was introduced, with mAP scores reaching 78.3%, highlighting challenges from long-term stylistic changes. This dataset aims to establish a new benchmark for multilingual historical text recognition and temporally-aware writer analysis.

Key takeaway

For Machine Learning Engineers developing historical document analysis systems, BullingerDB offers a critical benchmark. You should consider integrating this large, multilingual dataset to train and evaluate your handwritten text recognition and writer retrieval models, especially when dealing with stylistic variation over time. Apply the introduced temporal nDCG metric to assess your writer retrieval solutions more accurately, ensuring they account for chronological changes in handwriting.

Key insights

BullingerDB is a large, multilingual dataset for historical HTR and writer retrieval, introducing temporal metrics and highlighting challenges from long-term stylistic variation.

Principles

Historical HTR benefits from multilingual, time-aware data.
Long-term stylistic shifts complicate writer retrieval.
Temporal metrics are vital for historical writer analysis.

Method

The study introduces BullingerDB, a dataset for historical document analysis, and proposes a temporal nDCG metric to assess time-aware writer retrieval performance, complementing standard mAP scores.

In practice

Train HTR models on multilingual historical texts.
Evaluate writer retrieval with temporal nDCG.
Develop models robust to long-term style changes.

Topics

BullingerDB
Handwritten Text Recognition
Writer Retrieval
Historical Document Analysis
Multilingual Text
Temporal Metrics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.