Angelos Perivolaropoulos of ElevenLabs at RAAIS 2026

2025-10-09 · Source: Air Street Press · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

Angelos Perivolaropoulos, Head of Research Engineering for speech-to-text at ElevenLabs, will speak at RAAIS 2026 on June 12th in London. His work centers on ElevenLabs' Scribe v2 and Scribe v2 Realtime transcription models. Scribe v2, launched in January 2026, is optimized for high-accuracy batch transcription of complex recordings, achieving a 2.3% word error rate on the AA-WER v2.0 benchmark and featuring entity detection across 56 categories. Scribe v2 Realtime, released in November 2025, provides low-latency live transcription (around 150 milliseconds) across over 90 languages, with automatic language detection and predictive transcription, reporting the lowest word error rate on the FLEURS multilingual benchmark for low-latency ASR. This work addresses the critical balance between transcription accuracy and real-time performance for interactive AI systems.

Key takeaway

For machine learning engineers developing interactive voice AI agents, you must prioritize both transcription accuracy and real-time latency as separate, equally critical design considerations. Offline model quality alone is insufficient; your system needs to stream partial understanding quickly and stably under human conversational latency budgets to ensure user trust and effective interaction. Evaluate solutions like ElevenLabs' Scribe v2 Realtime for live applications.

Key insights

Real-world voice AI demands balancing high speech-to-text accuracy with critical low-latency performance for effective interactive systems.

Principles

Speech-to-text accuracy is not a single metric; it varies by context.
Real-time transcription requires different optimizations than batch processing.
Production AI success depends on speed, stability, and cost, not just offline accuracy.

Method

ElevenLabs employs distinct models, Scribe v2 for high-accuracy batch processing and Scribe v2 Realtime for low-latency streaming, each optimized for specific production constraints and use cases.

In practice

Use Scribe v2 for subtitling, media libraries, and compliance workflows.
Deploy Scribe v2 Realtime for live agents and conversational interfaces.
Consider latency budgets as crucial as accuracy for interactive AI.

Topics

Speech-to-Text
ElevenLabs Scribe
Real-time ASR
Batch Transcription
Voice AI
Latency Optimization
Word Error Rate

Best for: AI Architect, AI Engineer, AI Product Manager, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Air Street Press.