Build Small with Cohere Labs
Summary
Cohere, a leading enterprise foundation model provider, and its research arm, Cohere Labs, presented three models for a "Built Small Hackathon." Tiny Aya is a compact, 3.3 billion parameter, open-weight model supporting over 70 languages, designed for on-device deployment on phones and laptops, offering 8-bit, 4-bit, and 6-bit mixed precision quantization. It achieves 32 tokens per second on an iPhone 17 Pro. The North Mini Code model, a 30 billion parameter MoE (3 billion active), specializes in code generation and agentic software engineering, with 8-bit quantization support. Cohere Transcribe, a 2 billion parameter speech recognition model, features an encoder-decoder transformer with 90% of parameters in its encoder for fast inference. Trained on 0.5 million hours of data, it supports 14 languages and excels in far-field scenarios, designed for verbatim transcription.
Key takeaway
For AI Engineers developing compact, multilingual applications, Cohere's Tiny Aya models offer versatile on-device deployment across 70+ languages, with various quantization options. You should consider using a dedicated transcription model like Cohere Transcribe as a front-end for LLMs to preserve text performance, especially when real-time or robust far-field audio processing is critical. Explore the North Mini Code model for efficient code generation tasks.
Key insights
Cohere focuses on compact, efficient, and multilingual AI models for diverse applications and deployment environments.
Principles
- Prioritize multilingual support, especially for low-resource languages.
- Design models for on-device and air-gapped environments.
- Optimize for inference speed by allocating parameters strategically.
Method
Cohere Transcribe uses an encoder-decoder transformer with a conformer encoder, placing ~90% of parameters in the encoder for faster, cheaper inference. Trained on 0.5M hours of cleaned open-source and synthetic data.
In practice
- Use Tiny Aya Global for general English applications.
- Experiment with Tiny Aya regional variants for specific language subsets.
- Combine transcription models with LLMs for better text performance.
Topics
- Tiny Aya
- Multilingual LLMs
- On-Device AI
- Speech Recognition
- Code Generation
- Model Quantization
Best for: MLOps Engineer, NLP Engineer, AI Architect, AI Engineer, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by HuggingFace.