Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer Learning
Summary
Researchers have successfully adapted the Transformer-based Optical Character Recognition (TrOCR) model for printed Tigrinya text, which uses the Ge'ez script. This marks the first such adaptation for an African syllabic writing system. The process involved extending TrOCR's byte-level BPE tokenizer to include 230 Ge'ez characters and introducing a novel Word-Aware Loss Weighting mechanism. This weighting specifically addresses word-boundary recognition issues that arise when applying Latin-centric BPE conventions to new scripts. The adapted TrOCR-Printed model achieved a 0.22% Character Error Rate and 97.20% exact match accuracy on a 5,000-image synthetic test set from the GLOCR dataset. The entire adaptation pipeline trains in under three hours on a single 8 GB consumer GPU, with all code and model weights publicly released.
Key takeaway
For research scientists working on OCR for non-Latin or non-CJK scripts, you should consider adapting existing Transformer models like TrOCR by extending their tokenizers and implementing script-specific loss weighting. This approach, particularly Word-Aware Loss Weighting, can drastically improve accuracy and resolve systematic word-boundary errors, enabling efficient transfer learning even on consumer-grade GPUs.
Key insights
Adapting TrOCR for Tigrinya text requires tokenizer extension and Word-Aware Loss Weighting to overcome script-specific challenges.
Principles
- BPE tokenizers can struggle with new script word boundaries.
- Targeted loss weighting improves cross-script transfer learning.
Method
Extend TrOCR's byte-level BPE tokenizer for new characters and apply Word-Aware Loss Weighting to resolve systematic word-boundary failures during cross-script adaptation.
In practice
- Extend BPE tokenizers for new script characters.
- Implement word-aware loss weighting for boundary issues.
Topics
- TrOCR Adaptation
- Tigrinya Text Recognition
- Ge'ez Script OCR
- Word-Aware Loss Weighting
- Cross-Script Transfer Learning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.