Mistral drops Voxtral Transcribe 2, an open-source speech model that runs on-device for pennies

2026-02-04 · Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Mistral AI has released Voxtral Transcribe 2, a new suite of open-source speech-to-text models designed for on-device processing. These models, including Voxtral Mini Transcribe V2 for batch processing and Voxtral Realtime for live audio, aim to offer superior accuracy, speed, and cost-efficiency compared to existing market solutions. Voxtral Mini Transcribe V2 is available via API at $0.003 per minute and supports 13 languages, while Voxtral Realtime, an Apache 2.0 open-source model, can achieve latencies down to 200 milliseconds. A key differentiator is their ability to process sensitive audio locally on devices like smartphones or laptops, addressing critical data privacy concerns for regulated industries such as healthcare and finance. Mistral also introduced context biasing, allowing users to upload specialized terminology for improved transcription accuracy without retraining.

Key takeaway

For CTOs and Machine Learning Engineers evaluating speech-to-text solutions, Mistral's Voxtral Transcribe 2 offers a compelling privacy-first alternative. Its on-device processing capabilities and open-source real-time model can significantly reduce data sovereignty risks and operational costs, especially for applications in regulated sectors. You should test Voxtral Transcribe 2 in Mistral Studio to assess its performance with your specific audio data and specialized terminology.

Key insights

Mistral's Voxtral Transcribe 2 offers accurate, cost-effective, on-device speech-to-text with strong privacy features.

Principles

On-device processing enhances data privacy.
Smaller models can achieve competitive performance.
Open-source fosters application innovation.

Method

Voxtral Transcribe 2 uses a 4-billion parameter model with curated training data and context biasing to improve accuracy and robustness for specialized terminology without fine-tuning.

In practice

Deploy Voxtral Realtime for live subtitling.
Use context biasing for industry-specific jargon.
Process pre-recorded audio with Voxtral Mini Transcribe V2.

Topics

Speech-to-Text
On-device AI
Open-source Models
Enterprise AI
Data Privacy

Best for: CTO, Machine Learning Engineer, NLP Engineer, AI Engineer, MLOps Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.