GLM-OCR (9B) - Local OCR Test | OCR, Document Extraction, Table Recognition

2026-02-09 · Source: Venelin Valkov · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

ZAI has introduced GM OCR, a new open-source optical character recognition (OCR) model, positioned as a market leader over previous dominant models like Paddle OCR and DeepS OCR. GM OCR operates as a two-stage pipeline, first performing document analysis to identify structural elements such as titles, paragraphs, tables, and figures, then recognizing characters within these identified layout elements. The model is a compact 0.9 billion parameter model, making it suitable for deployment on most modern GPUs, and is licensed under MIT. Benchmarks suggest strong performance, particularly with complex tables, code, figures, and charts. A practical evaluation in a Google Colab notebook using a T4 GPU demonstrated its capabilities, including support for text recognition, table recognition, and custom schema extraction into JSON format, with the full 16-bit floating point version occupying approximately 2.2 GB of VRAM.

Key takeaway

For AI engineers building RAG systems or deploying OCR solutions, GM OCR offers a compelling open-source option. Its small 0.9 billion parameter size and MIT license make it highly deployable on standard GPUs, while its two-stage pipeline and custom data extraction capabilities can significantly improve accuracy and flexibility for diverse document types, including complex tables and receipts. Consider integrating GM OCR for robust, efficient text and structured data extraction.

Key insights

GM OCR is a small, MIT-licensed, two-stage open-source model excelling in complex document and custom data extraction.

Principles

Two-stage OCR pipelines enhance accuracy.
Smaller models can achieve leading performance.

Method

GM OCR uses a two-stage pipeline: first, document analysis identifies structural elements (titles, tables, figures), then OCR recognizes characters within these specific layout components.

In practice

Use GM OCR for complex table and code extraction.
Extract custom data fields into JSON format.
Quantized versions can further speed up inference.

Topics

GM OCR
Optical Character Recognition
Document Analysis
Data Extraction
Model Performance

Best for: AI Engineer, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.