ElevenLabs and Google dominate Artificial Analysis' updated speech-to-text benchmark
Summary
Artificial Analysis has released version 2.0 of its AA-WER speech-to-text benchmark, revealing ElevenLabs' Scribe v2 as the top performer with a word error rate (WER) of 2.3%. Google's Gemini 3 Pro followed closely at 2.9%, with Mistral's Voxtral Small achieving 3.0%. Google's Gemini 3 Flash (3.1%) and ElevenLabs' Scribe v1 (3.2%) also showed strong results. Notably, Google's strong performance stems from Gemini's general multimodal capabilities rather than specific transcription training. OpenAI's Whisper Large v3 recorded a 4.2% WER, while Alibaba's Qwen3 ASR Flash (5.9%), Amazon's Nova 2 Omni (6.0%), and Rev AI (6.1%) ranked lower. In the specialized AA-AgentTalk voice assistant test, Scribe v2 (1.6%) and Gemini 3 Pro (1.7%) again led, with AssemblyAI's Universal-3 Pro at 2.3%.
Key takeaway
For NLP Engineers evaluating speech-to-text solutions, ElevenLabs' Scribe v2 and Google's Gemini 3 Pro demonstrate leading performance in the latest AA-WER v2.0 benchmark. You should prioritize these models for applications requiring high transcription accuracy, especially for voice assistant interactions where they significantly outperform competitors. Consider Google's Gemini 3 Pro if your project also benefits from broader multimodal capabilities.
Key insights
ElevenLabs' Scribe v2 and Google's Gemini 3 Pro lead the latest speech-to-text benchmarks.
Principles
- Multimodal models can excel in specialized tasks.
- Benchmarking reveals competitive performance differences.
Method
The AA-WER v2.0 benchmark evaluates speech-to-text models using word error rate (WER). A separate AA-AgentTalk test assesses performance for voice assistant interactions.
In practice
- Consider Scribe v2 for high-accuracy transcription.
- Evaluate Gemini 3 Pro for multimodal AI applications.
Topics
- Speech-to-Text
- Word Error Rate
- ElevenLabs Scribe
- Google Gemini
- Voice Assistants
Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.