Mistral drops new speech-to-text AI models
Summary
Mistral, a French AI startup, has launched two new speech-to-text models, Voxtral Mini Transcribe V2 and Voxtral Realtime, under the Voxtral Transcribe 2 umbrella. These models aim to establish new benchmarks for speed, privacy, and affordability in transcription. Voxtral Realtime is designed for live audio processing, featuring a novel streaming architecture that delivers transcriptions with configurable delays as low as 200 milliseconds and supports 13 languages. It has four billion parameters, enabling on-device deployment for enhanced privacy and security, and is available under an Apache 2.0 license or via API at $0.006 per minute. Voxtral Mini Transcribe 2 focuses on batch transcription of pre-recorded audio, offering speaker diarization, context biasing, and word-level timestamps. It supports the same 13 languages and boasts a 4% word error rate on the FLEURS benchmark, priced at $0.003 per minute.
Key takeaway
For AI Architects and Machine Learning Engineers evaluating speech-to-text solutions, Mistral's Voxtral models offer compelling options. Consider Voxtral Realtime for applications requiring ultra-low latency live transcription and on-device privacy, or Voxtral Mini Transcribe 2 for cost-effective batch processing with advanced diarization. Evaluate their performance and pricing against your specific project requirements, especially for multilingual and privacy-sensitive deployments.
Key insights
Mistral's new speech-to-text models prioritize speed, privacy, and affordability for diverse enterprise applications.
Principles
- On-device processing enhances privacy.
- Streaming architecture improves real-time transcription.
- Multilingual support expands application scope.
Method
Voxtral Realtime uses a novel streaming architecture for ultra-low latency live audio processing. Voxtral Mini Transcribe 2 handles batch processing with features like speaker diarization and context biasing.
In practice
- Deploy Realtime for voice agents or subtitling.
- Use Mini Transcribe 2 for compliance documentation.
- Leverage on-device models for sensitive data.
Topics
- Speech-to-text
- Mistral AI
- Real-time Transcription
- On-device AI
- Speaker Diarization
Best for: AI Architect, Machine Learning Engineer, NLP Engineer, AI Engineer, AI Product Manager, Tech Journalist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by aibusiness.