GLM-OCR vs DeepSeek OCR 2: Which One Wins at Markdown Extraction?

· Source: Andrej Baranovskij · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

A comparative analysis was conducted on two new Optical Character Recognition (OCR) models, GLM OCR and DeepSeek OCR 2, both available on MLX VLM and runnable on Apple Silicon and Linux. The evaluation, performed on a 64GB Mac Mini M4 Pro, focused on their performance in converting documents to Markdown, specifically testing with simple tables, complex bank statements, and financial statements. GLM OCR, approximately 2GB in size, performed well on simple tables (7.7 seconds) but failed to accurately process more complex bank statements and financial statements, often generating corrupted or incomplete data. DeepSeek OCR 2, also BF16, demonstrated superior performance, accurately processing all document types, including complex financial statements in 8 seconds and bank statements in 12 seconds, and was twice as fast on simple tables (3.5 seconds) compared to GLM OCR.

Key takeaway

For AI Engineers and Machine Learning Engineers developing local document processing applications on Apple Silicon, DeepSeek OCR 2 is the clear choice over GLM OCR. Its superior speed and accuracy across various document complexities, including large tables, make it a more reliable option for converting documents to Markdown and minimizing data hallucinations compared to larger vision-language models.

Key insights

DeepSeek OCR 2 outperforms GLM OCR in speed and accuracy for local document processing on Apple Silicon.

Principles

Method

The evaluation used a consistent "convert document to markdown" prompt across both models and tested them against three document types: simple table, complex bank statement, and financial statement.

In practice

Topics

Best for: Machine Learning Engineer, AI Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.