Local OCR Comparison: dots.ocr More Accurate, DeepSeek-OCR 2 Faster (Sparrow + MLX)
Summary
A local comparison of DOT OCR and DeepSeek-OCR 2 models, both running in BF16 on a Mac Mini M4 Pro with 64GB using MLX VM, reveals distinct performance and accuracy trade-offs. DOT OCR consistently demonstrates higher accuracy, particularly in handling complex table structures where values span multiple rows or include descriptive sub-text, and it provides structured JSON output with markdown-formatted data blocks. However, DOT OCR is significantly slower, taking 30-51 seconds for various documents. DeepSeek-OCR 2, while faster at 6.5-11 seconds per document, sometimes struggles with multi-line row grouping and may omit details like currency signs, returning a single markdown output for the entire file. The analysis used bank statements, bond market data, lab results, and financial statements to highlight these differences.
Key takeaway
For MLOps engineers deploying OCR solutions, if your application demands high accuracy for complex, multi-row table structures or precise data extraction including currency signs, prioritize DOT OCR despite its slower inference. Conversely, if processing speed is paramount for simpler documents, DeepSeek-OCR 2 offers a faster alternative. Consider integrating both models into your workflow to dynamically select the optimal engine based on document complexity and performance requirements.
Key insights
DOT OCR offers higher accuracy for complex tables, while DeepSeek-OCR 2 provides faster inference for simpler documents.
Principles
- Accuracy often trades off with inference speed.
- Structured JSON output simplifies data parsing.
Method
Models were run locally on a Mac Mini M4 Pro using MLX VM in BF16, processing various document types like bank statements and financial reports to compare OCR performance.
In practice
- Use DOT OCR for high-accuracy table extraction.
- Opt for DeepSeek-OCR 2 when speed is critical.
- Integrate both models for diverse document processing.
Topics
- OCR Models
- Document Processing
- Inference Performance
- Table Extraction
- MLX Framework
Best for: Machine Learning Engineer, Data Scientist, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.