Mistral drops Voxtral Transcribe 2, an open-source speech model that runs on-device for pennies
Summary
Mistral AI has released Voxtral Transcribe 2, a new suite of open-source speech-to-text models designed for on-device processing. These models, including Voxtral Mini Transcribe V2 for batch processing and Voxtral Realtime for live audio, aim to offer superior accuracy, speed, and cost-efficiency compared to existing market solutions. Voxtral Mini Transcribe V2 is available via API at $0.003 per minute and supports 13 languages, while Voxtral Realtime, an Apache 2.0 open-source model, can achieve latencies down to 200 milliseconds. A key differentiator is their ability to process sensitive audio locally on devices like smartphones or laptops, addressing critical data privacy concerns for regulated industries such as healthcare and finance. Mistral also introduced context biasing, allowing users to upload specialized terminology for improved transcription accuracy without retraining.
Key takeaway
For CTOs and Machine Learning Engineers evaluating speech-to-text solutions, Mistral's Voxtral Transcribe 2 offers a compelling privacy-first alternative. Its on-device processing capabilities and open-source real-time model can significantly reduce data sovereignty risks and operational costs, especially for applications in regulated sectors. You should test Voxtral Transcribe 2 in Mistral Studio to assess its performance with your specific audio data and specialized terminology.
Key insights
Mistral's Voxtral Transcribe 2 offers accurate, cost-effective, on-device speech-to-text with strong privacy features.
Principles
- On-device processing enhances data privacy.
- Smaller models can achieve competitive performance.
- Open-source fosters application innovation.
Method
Voxtral Transcribe 2 uses a 4-billion parameter model with curated training data and context biasing to improve accuracy and robustness for specialized terminology without fine-tuning.
In practice
- Deploy Voxtral Realtime for live subtitling.
- Use context biasing for industry-specific jargon.
- Process pre-recorded audio with Voxtral Mini Transcribe V2.
Topics
- Speech-to-Text
- On-device AI
- Open-source Models
- Enterprise AI
- Data Privacy
Best for: CTO, Machine Learning Engineer, NLP Engineer, AI Engineer, MLOps Engineer, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.