Adapting TrOCR for Printed Tigrinya Text Recognition: Word-Aware Loss Weighting for Cross-Script Transfer Learning

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

Researchers have successfully adapted the Transformer-based Optical Character Recognition (TrOCR) model for printed Tigrinya text, which uses the Ge'ez script. This marks the first such adaptation for an African syllabic writing system. The process involved extending TrOCR's byte-level BPE tokenizer to include 230 Ge'ez characters and introducing a novel Word-Aware Loss Weighting mechanism. This weighting specifically addresses word-boundary recognition issues that arise when applying Latin-centric BPE conventions to new scripts. The adapted TrOCR-Printed model achieved a 0.22% Character Error Rate and 97.20% exact match accuracy on a 5,000-image synthetic test set from the GLOCR dataset. The entire adaptation pipeline trains in under three hours on a single 8 GB consumer GPU, with all code and model weights publicly released.

Key takeaway

For research scientists working on OCR for non-Latin or non-CJK scripts, you should consider adapting existing Transformer models like TrOCR by extending their tokenizers and implementing script-specific loss weighting. This approach, particularly Word-Aware Loss Weighting, can drastically improve accuracy and resolve systematic word-boundary errors, enabling efficient transfer learning even on consumer-grade GPUs.

Key insights

Adapting TrOCR for Tigrinya text requires tokenizer extension and Word-Aware Loss Weighting to overcome script-specific challenges.

Principles

Method

Extend TrOCR's byte-level BPE tokenizer for new characters and apply Word-Aware Loss Weighting to resolve systematic word-boundary failures during cross-script adaptation.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.