Mistral drops new speech-to-text AI models

2025-12-15 · Source: aibusiness · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, quick

Summary

Mistral, a French AI startup, has launched two new speech-to-text models, Voxtral Mini Transcribe V2 and Voxtral Realtime, under the Voxtral Transcribe 2 umbrella. These models aim to establish new benchmarks for speed, privacy, and affordability in transcription. Voxtral Realtime is designed for live audio processing, featuring a novel streaming architecture that delivers transcriptions with configurable delays as low as 200 milliseconds and supports 13 languages. It has four billion parameters, enabling on-device deployment for enhanced privacy and security, and is available under an Apache 2.0 license or via API at $0.006 per minute. Voxtral Mini Transcribe 2 focuses on batch transcription of pre-recorded audio, offering speaker diarization, context biasing, and word-level timestamps. It supports the same 13 languages and boasts a 4% word error rate on the FLEURS benchmark, priced at $0.003 per minute.

Key takeaway

For AI Architects and Machine Learning Engineers evaluating speech-to-text solutions, Mistral's Voxtral models offer compelling options. Consider Voxtral Realtime for applications requiring ultra-low latency live transcription and on-device privacy, or Voxtral Mini Transcribe 2 for cost-effective batch processing with advanced diarization. Evaluate their performance and pricing against your specific project requirements, especially for multilingual and privacy-sensitive deployments.

Key insights

Mistral's new speech-to-text models prioritize speed, privacy, and affordability for diverse enterprise applications.

Principles

On-device processing enhances privacy.
Streaming architecture improves real-time transcription.
Multilingual support expands application scope.

Method

Voxtral Realtime uses a novel streaming architecture for ultra-low latency live audio processing. Voxtral Mini Transcribe 2 handles batch processing with features like speaker diarization and context biasing.

In practice

Deploy Realtime for voice agents or subtitling.
Use Mini Transcribe 2 for compliance documentation.
Leverage on-device models for sensitive data.

Topics

Speech-to-text
Mistral AI
Real-time Transcription
On-device AI
Speaker Diarization

Best for: AI Architect, Machine Learning Engineer, NLP Engineer, AI Engineer, AI Product Manager, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by aibusiness.