Now in Foundry: Cohere Transcribe, Nanbeige 4.1-3B, and Octen Embedding

2026-04-06 · Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Intermediate, medium

Summary

This week's Model Mondays highlights three open-source AI models available through Microsoft Foundry, each addressing a different layer of the AI application stack. Cohere's cohere-transcribe-03-2026 is a 2B ASR model that leads the Open ASR Leaderboard with a 5.42% average Word Error Rate across 8 English benchmarks and supports 14 languages with automatic long-form audio chunking. Nanbeige's Nanbeige4.1-3B, a 3B reasoning model, surpasses larger models in coding, math, and deep-search benchmarks, scoring 76.9 on LiveCodeBench-V6 and 73.2 on Arena-Hard-v2, and offers native tool-use support. Octen's Octen-Embedding-0.6B, a 0.6B text embedding model, achieves a 0.7241 mean task score on the RTEB leaderboard, outperforming larger proprietary models, and provides 32,768-token context for long-document retrieval across 100+ languages and specialized domains.

Key takeaway

For AI Architects and NLP Engineers building production-grade systems, these models demonstrate that high performance across speech, reasoning, and retrieval can be achieved with compact, open-source solutions. You should consider integrating these specialized models via Microsoft Foundry to optimize for cost and efficiency without sacrificing accuracy, especially for multilingual or domain-specific applications. Evaluate their benchmarks against your specific requirements before committing to larger, more resource-intensive alternatives.

Key insights

Specialized, smaller open-source models can outperform larger general-purpose models across various AI tasks.

Principles

Targeted fine-tuning improves performance
Dedicated architectures enhance efficiency
Preference alignment boosts output quality

Method

Deploy open-source models from Hugging Face directly via Microsoft Foundry's catalog or "Deploy on Microsoft Foundry" option from the Hugging Face Hub for secure, scalable inference.

In practice

Use cohere-transcribe for multilingual meeting transcription
Apply Nanbeige4.1-3B for complex agentic workflows
Employ Octen-Embedding-0.6B for domain-specific semantic search

Topics

Cohere Transcribe
Nanbeige 4.1-3B
Octen Embedding
Automatic Speech Recognition
Text Embeddings

Best for: AI Architect, NLP Engineer, Machine Learning Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.