Now in Foundry: Command A+ (W4A4), Chandra OCR 2, and GLM-OCR

· Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

Microsoft Foundry now features three new Hugging Face models, reflecting trends in low-bit quantization and advanced OCR. Cohere Labs' Command A+ (W4A4) is a 218B-parameter Sparse Mixture-of-Experts reasoning model, with 25B active parameters per token, a 128K input and 64K output context. It is optimized for agentic, multilingual (48 languages), and multimodal reasoning tasks, utilizing W4A4 quantization for efficient deployment. Datalab's Chandra OCR 2, a 5.3B vision-language model, converts images and PDFs to markdown, HTML, and JSON while preserving layout. It achieves state-of-the-art results on the olmOCR benchmark with an 85.9% score (77.8% multilingual) and supports over 90 languages, excelling in complex layout understanding. Z.ai's GLM-OCR is a compact 0.9B OCR model, roughly 6x smaller than Chandra OCR 2, built on the GLM-V architecture. It ranks first on OmniDocBench V1.5 with a 94.62 score, offering high accuracy for structured document understanding and efficient, scalable deployment. These models are deployable via Microsoft Foundry.

Key takeaway

For AI Engineers deploying large reasoning models or building document processing pipelines, these new Hugging Face models in Microsoft Foundry offer critical advancements. If you need to run a 218B-parameter model efficiently, Command A+ (W4A4) provides optimized performance. For robust, multilingual OCR with complex layout understanding, Chandra OCR 2 is a strong choice. Alternatively, GLM-OCR offers high accuracy at a compact 0.9B scale for high-throughput scenarios. Evaluate these models in Foundry to enhance your application's capabilities and operational efficiency.

Key insights

Low-bit quantization and specialized OCR models are advancing large model deployment and document understanding efficiency.

Principles

Method

Command A+ uses post-training against a full-precision teacher's output distribution, with fake quantization in the forward pass and straight-through estimators during backpropagation, to mitigate quantization errors.

In practice

Topics

Code references

Best for: AI Architect, NLP Engineer, Computer Vision Engineer, AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.