GLM-OCR vs DeepSeek OCR 2: Which One Wins at Markdown Extraction?
Summary
A comparative analysis was conducted on two new Optical Character Recognition (OCR) models, GLM OCR and DeepSeek OCR 2, both available on MLX VLM and runnable on Apple Silicon and Linux. The evaluation, performed on a 64GB Mac Mini M4 Pro, focused on their performance in converting documents to Markdown, specifically testing with simple tables, complex bank statements, and financial statements. GLM OCR, approximately 2GB in size, performed well on simple tables (7.7 seconds) but failed to accurately process more complex bank statements and financial statements, often generating corrupted or incomplete data. DeepSeek OCR 2, also BF16, demonstrated superior performance, accurately processing all document types, including complex financial statements in 8 seconds and bank statements in 12 seconds, and was twice as fast on simple tables (3.5 seconds) compared to GLM OCR.
Key takeaway
For AI Engineers and Machine Learning Engineers developing local document processing applications on Apple Silicon, DeepSeek OCR 2 is the clear choice over GLM OCR. Its superior speed and accuracy across various document complexities, including large tables, make it a more reliable option for converting documents to Markdown and minimizing data hallucinations compared to larger vision-language models.
Key insights
DeepSeek OCR 2 outperforms GLM OCR in speed and accuracy for local document processing on Apple Silicon.
Principles
- Model performance varies significantly with document complexity.
- Smaller, specialized models can exceed general vision models for specific tasks.
Method
The evaluation used a consistent "convert document to markdown" prompt across both models and tested them against three document types: simple table, complex bank statement, and financial statement.
In practice
- DeepSeek OCR 2 can be integrated for faster, more accurate table processing.
- Consider specialized OCR models to reduce reliance on full vision-LMs.
Topics
- DeepSeek OCR 2
- GLM OCR
- Apple Silicon Inference
- Document Processing
- Table Recognition
Best for: Machine Learning Engineer, AI Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.