GLM-OCR (9B) - Local OCR Test | OCR, Document Extraction, Table Recognition
Summary
ZAI has introduced GM OCR, a new open-source optical character recognition (OCR) model, positioned as a market leader over previous dominant models like Paddle OCR and DeepS OCR. GM OCR operates as a two-stage pipeline, first performing document analysis to identify structural elements such as titles, paragraphs, tables, and figures, then recognizing characters within these identified layout elements. The model is a compact 0.9 billion parameter model, making it suitable for deployment on most modern GPUs, and is licensed under MIT. Benchmarks suggest strong performance, particularly with complex tables, code, figures, and charts. A practical evaluation in a Google Colab notebook using a T4 GPU demonstrated its capabilities, including support for text recognition, table recognition, and custom schema extraction into JSON format, with the full 16-bit floating point version occupying approximately 2.2 GB of VRAM.
Key takeaway
For AI engineers building RAG systems or deploying OCR solutions, GM OCR offers a compelling open-source option. Its small 0.9 billion parameter size and MIT license make it highly deployable on standard GPUs, while its two-stage pipeline and custom data extraction capabilities can significantly improve accuracy and flexibility for diverse document types, including complex tables and receipts. Consider integrating GM OCR for robust, efficient text and structured data extraction.
Key insights
GM OCR is a small, MIT-licensed, two-stage open-source model excelling in complex document and custom data extraction.
Principles
- Two-stage OCR pipelines enhance accuracy.
- Smaller models can achieve leading performance.
Method
GM OCR uses a two-stage pipeline: first, document analysis identifies structural elements (titles, tables, figures), then OCR recognizes characters within these specific layout components.
In practice
- Use GM OCR for complex table and code extraction.
- Extract custom data fields into JSON format.
- Quantized versions can further speed up inference.
Topics
- GM OCR
- Optical Character Recognition
- Document Analysis
- Data Extraction
- Model Performance
Best for: AI Engineer, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.