Angelos Perivolaropoulos of ElevenLabs at RAAIS 2026
Summary
Angelos Perivolaropoulos, Head of Research Engineering for speech-to-text at ElevenLabs, will speak at RAAIS 2026 on June 12th in London. His work centers on ElevenLabs' Scribe v2 and Scribe v2 Realtime transcription models. Scribe v2, launched in January 2026, is optimized for high-accuracy batch transcription of complex recordings, achieving a 2.3% word error rate on the AA-WER v2.0 benchmark and featuring entity detection across 56 categories. Scribe v2 Realtime, released in November 2025, provides low-latency live transcription (around 150 milliseconds) across over 90 languages, with automatic language detection and predictive transcription, reporting the lowest word error rate on the FLEURS multilingual benchmark for low-latency ASR. This work addresses the critical balance between transcription accuracy and real-time performance for interactive AI systems.
Key takeaway
For machine learning engineers developing interactive voice AI agents, you must prioritize both transcription accuracy and real-time latency as separate, equally critical design considerations. Offline model quality alone is insufficient; your system needs to stream partial understanding quickly and stably under human conversational latency budgets to ensure user trust and effective interaction. Evaluate solutions like ElevenLabs' Scribe v2 Realtime for live applications.
Key insights
Real-world voice AI demands balancing high speech-to-text accuracy with critical low-latency performance for effective interactive systems.
Principles
- Speech-to-text accuracy is not a single metric; it varies by context.
- Real-time transcription requires different optimizations than batch processing.
- Production AI success depends on speed, stability, and cost, not just offline accuracy.
Method
ElevenLabs employs distinct models, Scribe v2 for high-accuracy batch processing and Scribe v2 Realtime for low-latency streaming, each optimized for specific production constraints and use cases.
In practice
- Use Scribe v2 for subtitling, media libraries, and compliance workflows.
- Deploy Scribe v2 Realtime for live agents and conversational interfaces.
- Consider latency budgets as crucial as accuracy for interactive AI.
Topics
- Speech-to-Text
- ElevenLabs Scribe
- Real-time ASR
- Batch Transcription
- Voice AI
- Latency Optimization
- Word Error Rate
Best for: AI Architect, AI Engineer, AI Product Manager, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Air Street Press.