Build Small with Cohere Labs

2026-06-11 · Source: HuggingFace · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

Cohere, a leading enterprise foundation model provider, and its research arm, Cohere Labs, presented three models for a "Built Small Hackathon." Tiny Aya is a compact, 3.3 billion parameter, open-weight model supporting over 70 languages, designed for on-device deployment on phones and laptops, offering 8-bit, 4-bit, and 6-bit mixed precision quantization. It achieves 32 tokens per second on an iPhone 17 Pro. The North Mini Code model, a 30 billion parameter MoE (3 billion active), specializes in code generation and agentic software engineering, with 8-bit quantization support. Cohere Transcribe, a 2 billion parameter speech recognition model, features an encoder-decoder transformer with 90% of parameters in its encoder for fast inference. Trained on 0.5 million hours of data, it supports 14 languages and excels in far-field scenarios, designed for verbatim transcription.

Key takeaway

For AI Engineers developing compact, multilingual applications, Cohere's Tiny Aya models offer versatile on-device deployment across 70+ languages, with various quantization options. You should consider using a dedicated transcription model like Cohere Transcribe as a front-end for LLMs to preserve text performance, especially when real-time or robust far-field audio processing is critical. Explore the North Mini Code model for efficient code generation tasks.

Key insights

Cohere focuses on compact, efficient, and multilingual AI models for diverse applications and deployment environments.

Principles

Prioritize multilingual support, especially for low-resource languages.
Design models for on-device and air-gapped environments.
Optimize for inference speed by allocating parameters strategically.

Method

Cohere Transcribe uses an encoder-decoder transformer with a conformer encoder, placing ~90% of parameters in the encoder for faster, cheaper inference. Trained on 0.5M hours of cleaned open-source and synthetic data.

In practice

Use Tiny Aya Global for general English applications.
Experiment with Tiny Aya regional variants for specific language subsets.
Combine transcription models with LLMs for better text performance.

Topics

Tiny Aya
Multilingual LLMs
On-Device AI
Speech Recognition
Code Generation
Model Quantization

Best for: MLOps Engineer, NLP Engineer, AI Architect, AI Engineer, Machine Learning Engineer, AI Student

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by HuggingFace.