Build Small with Cohere Labs

· Source: HuggingFace · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

Cohere, a leading enterprise foundation model provider, and its research arm, Cohere Labs, presented three models for a "Built Small Hackathon." Tiny Aya is a compact, 3.3 billion parameter, open-weight model supporting over 70 languages, designed for on-device deployment on phones and laptops, offering 8-bit, 4-bit, and 6-bit mixed precision quantization. It achieves 32 tokens per second on an iPhone 17 Pro. The North Mini Code model, a 30 billion parameter MoE (3 billion active), specializes in code generation and agentic software engineering, with 8-bit quantization support. Cohere Transcribe, a 2 billion parameter speech recognition model, features an encoder-decoder transformer with 90% of parameters in its encoder for fast inference. Trained on 0.5 million hours of data, it supports 14 languages and excels in far-field scenarios, designed for verbatim transcription.

Key takeaway

For AI Engineers developing compact, multilingual applications, Cohere's Tiny Aya models offer versatile on-device deployment across 70+ languages, with various quantization options. You should consider using a dedicated transcription model like Cohere Transcribe as a front-end for LLMs to preserve text performance, especially when real-time or robust far-field audio processing is critical. Explore the North Mini Code model for efficient code generation tasks.

Key insights

Cohere focuses on compact, efficient, and multilingual AI models for diverse applications and deployment environments.

Principles

Method

Cohere Transcribe uses an encoder-decoder transformer with a conformer encoder, placing ~90% of parameters in the encoder for faster, cheaper inference. Trained on 0.5M hours of cleaned open-source and synthetic data.

In practice

Topics

Best for: MLOps Engineer, NLP Engineer, AI Architect, AI Engineer, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by HuggingFace.