Now in Foundry: Command A+ (W4A4), Chandra OCR 2, and GLM-OCR
Summary
Microsoft Foundry now features three new Hugging Face models, reflecting trends in low-bit quantization and advanced OCR. Cohere Labs' Command A+ (W4A4) is a 218B-parameter Sparse Mixture-of-Experts reasoning model, with 25B active parameters per token, a 128K input and 64K output context. It is optimized for agentic, multilingual (48 languages), and multimodal reasoning tasks, utilizing W4A4 quantization for efficient deployment. Datalab's Chandra OCR 2, a 5.3B vision-language model, converts images and PDFs to markdown, HTML, and JSON while preserving layout. It achieves state-of-the-art results on the olmOCR benchmark with an 85.9% score (77.8% multilingual) and supports over 90 languages, excelling in complex layout understanding. Z.ai's GLM-OCR is a compact 0.9B OCR model, roughly 6x smaller than Chandra OCR 2, built on the GLM-V architecture. It ranks first on OmniDocBench V1.5 with a 94.62 score, offering high accuracy for structured document understanding and efficient, scalable deployment. These models are deployable via Microsoft Foundry.
Key takeaway
For AI Engineers deploying large reasoning models or building document processing pipelines, these new Hugging Face models in Microsoft Foundry offer critical advancements. If you need to run a 218B-parameter model efficiently, Command A+ (W4A4) provides optimized performance. For robust, multilingual OCR with complex layout understanding, Chandra OCR 2 is a strong choice. Alternatively, GLM-OCR offers high accuracy at a compact 0.9B scale for high-throughput scenarios. Evaluate these models in Foundry to enhance your application's capabilities and operational efficiency.
Key insights
Low-bit quantization and specialized OCR models are advancing large model deployment and document understanding efficiency.
Principles
- Quantization can enable large models on minimal hardware.
- Post-training mitigates quantization errors in reasoning models.
- Vision-language models enhance OCR layout preservation.
Method
Command A+ uses post-training against a full-precision teacher's output distribution, with fake quantization in the forward pass and straight-through estimators during backpropagation, to mitigate quantization errors.
In practice
- Deploy Command A+ for agentic, multilingual reasoning.
- Use Chandra OCR 2 for complex document layout extraction.
- Apply GLM-OCR for high-throughput, compact OCR pipelines.
Topics
- Cohere Command A+
- Chandra OCR 2
- GLM-OCR
- Low-bit Quantization
- Optical Character Recognition
- Microsoft Foundry
Code references
Best for: AI Architect, NLP Engineer, Computer Vision Engineer, AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.