Azure Speech at Build 2026: Powering Voice Agents with Real-Time and Life-like Experiences

2026-06-05 · Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

Azure Speech at Build 2026 introduces significant updates to facilitate production-grade, real-time, and multilingual voice agent experiences. Key announcements include the general availability of Voice Live for Foundry Prompt Agents, integrating speech-to-text, text-to-speech, turn detection, and interruption handling into a single API. Hosted agents with Voice Live are now in public preview, supporting frameworks like LangChain and custom orchestration via WebSocket and WebRTC. Enhancements to the Voice Live API feature new all-in-one speech-to-speech models like GPT-Realtime 1.5 and Azure-Realtime (Public Preview) for natural multilingual output, alongside MAI Transcribe-1 (Public Preview) for accurate input and Neural HD V3 voices. The LLM Speech API is generally available, offering LLM-powered transcription and translation across 25 languages, achieving top accuracy on the Open ASR Leaderboard. Additionally, new Speech Playgrounds and self-service fine-tuning for custom speech, voice, and avatars are integrated into Microsoft Foundry.

Key takeaway

For AI Engineers building conversational agents, these Azure Speech updates streamline the path to production for real-time, multilingual voice experiences. You should explore Voice Live for Foundry Prompt Agents for managed orchestration or hosted agents for custom framework integration. Leverage the new LLM Speech API for highly accurate transcription and translation, and utilize the Speech Playgrounds in Microsoft Foundry to prototype and fine-tune custom speech, voice, and avatars, ensuring your agents are responsive, natural, and uniquely branded.

Key insights

Azure Speech updates enable rapid development of real-time, customizable, and production-ready voice agents.

Principles

Voice is the default AI interface.
Unified APIs simplify complex voice interactions.
Customization enhances agent identity.

Method

Developers can prototype voice agents in Microsoft Foundry's Speech Playgrounds, then fine-tune custom speech, voice, and avatars for domain-specific needs.

In practice

Use Voice Live for enterprise-ready voice agents.
Deploy hosted agents with preferred frameworks.
Integrate WebRTC for low-latency web/mobile.

Topics

Azure Speech
Voice Agents
Microsoft Foundry
LLM Speech API
Custom Voice
Real-time Speech

Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.