DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines

2026-04-15 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision, Natural Language Processing · Depth: Expert, quick

Summary

DharmaOCR Full (7B) and DharmaOCR Lite (3B) are specialized small language models (SSLMs) designed for structured Optical Character Recognition (OCR), optimizing transcription quality, generation stability, and inference cost. These models achieve a new benchmark on the DharmaOCR-Benchmark, which includes printed, handwritten, and legal/administrative documents, by outperforming all evaluated open-source and commercial baselines. DharmaOCR Full scored 0.925 with a 0.40% degeneration rate, and DharmaOCR Lite scored 0.911 with a 0.20% degeneration rate. A key methodological contribution is the first application of Direct Preference Optimization (DPO) to OCR, using degenerate generations as rejected examples to penalize looping behavior. This, combined with Supervised Fine-Tuning (SFT) for strict JSON schema enforcement, reduced degeneration rates by up to 87.6% while maintaining or improving extraction quality. Additionally, AWQ quantization reduced per-page cost by up to 22% with minimal quality loss.

Key takeaway

For AI Engineers developing structured OCR solutions, DharmaOCR's approach offers a clear path to improving output quality and reducing operational costs. You should consider integrating Direct Preference Optimization (DPO) with Supervised Fine-Tuning (SFT) to enforce schema and minimize text degeneration, which directly impacts throughput and computational expense. Implementing AWQ quantization can further optimize your inference costs by up to 22% without significant quality compromise.

Key insights

Specialized small language models with DPO and SFT significantly improve structured OCR quality and stability.

Principles

DPO can penalize looping behavior in generation tasks.
Strict JSON schema enforcement improves output consistency.

Method

The method combines Supervised Fine-Tuning (SFT) for JSON schema enforcement with Direct Preference Optimization (DPO), using degenerate generations as rejected examples to reduce looping behavior in OCR output.

In practice

Apply DPO to reduce text degeneration in generative models.
Use AWQ quantization for cost-effective OCR inference.

Topics

DharmaOCR
Structured OCR
Small Language Models
Direct Preference Optimization
Text Degeneration

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.