DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
Summary
DharmaOCR Full (7B) and DharmaOCR Lite (3B) are specialized small language models (SSLMs) designed for structured Optical Character Recognition (OCR), optimizing transcription quality, generation stability, and inference cost. These models achieve a new benchmark on the DharmaOCR-Benchmark, which includes printed, handwritten, and legal/administrative documents, by outperforming all evaluated open-source and commercial baselines. DharmaOCR Full scored 0.925 with a 0.40% degeneration rate, and DharmaOCR Lite scored 0.911 with a 0.20% degeneration rate. A key methodological contribution is the first application of Direct Preference Optimization (DPO) to OCR, using degenerate generations as rejected examples to penalize looping behavior. This, combined with Supervised Fine-Tuning (SFT) for strict JSON schema enforcement, reduced degeneration rates by up to 87.6% while maintaining or improving extraction quality. Additionally, AWQ quantization reduced per-page cost by up to 22% with minimal quality loss.
Key takeaway
For AI Engineers developing structured OCR solutions, DharmaOCR's approach offers a clear path to improving output quality and reducing operational costs. You should consider integrating Direct Preference Optimization (DPO) with Supervised Fine-Tuning (SFT) to enforce schema and minimize text degeneration, which directly impacts throughput and computational expense. Implementing AWQ quantization can further optimize your inference costs by up to 22% without significant quality compromise.
Key insights
Specialized small language models with DPO and SFT significantly improve structured OCR quality and stability.
Principles
- DPO can penalize looping behavior in generation tasks.
- Strict JSON schema enforcement improves output consistency.
Method
The method combines Supervised Fine-Tuning (SFT) for JSON schema enforcement with Direct Preference Optimization (DPO), using degenerate generations as rejected examples to reduce looping behavior in OCR output.
In practice
- Apply DPO to reduce text degeneration in generative models.
- Use AWQ quantization for cost-effective OCR inference.
Topics
- DharmaOCR
- Structured OCR
- Small Language Models
- Direct Preference Optimization
- Text Degeneration
Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.