Public preview: Voice-native agents in Microsoft Foundry

· Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Intermediate, medium

Summary

Microsoft has introduced the public preview of Voice Live with the new Foundry Agent Service, enabling developers to build real-time, speech-to-speech AI agents on Azure more easily. This integration unifies agent orchestration and real-time voice interaction into a single developer experience, streamlining the creation of production-ready voice-native agents. The Foundry Agent Service, now generally available, features a redesigned API and runtime with built-in evaluators for quality, groundedness, safety, and risk. Voice Live is a real-time API offering premium speech capabilities, consolidating functionalities like speech recognition, text-to-speech, natural turn detection, interruption handling, and avatar integration into one API call. It supports over 140 locales with more than 700 voices, including 40+ neural HD conversational voices, and provides advanced conversational enhancements like noise suppression and echo cancellation. This service aims to accelerate development and reduce engineering complexity for voice-enabled agents.

Key takeaway

For AI Architects and CTOs evaluating platforms for conversational AI, the integration of Microsoft's Foundry Agent Service and Voice Live API significantly reduces development complexity and time-to-value for real-time, speech-to-speech agents. You should explore this unified service to build enterprise-grade voice agents that handle planning, reasoning, and tool execution, enabling natural, low-latency interactions and accelerating deployment of intelligent applications.

Key insights

Microsoft's new Foundry Agent Service and Voice Live API simplify building real-time, speech-to-speech AI agents on Azure.

Principles

Method

The system processes user speech via Voice Live API, converts it to conversational input, the Agent Service reasons and executes tools, then synthesizes and streams the response as natural speech.

In practice

Topics

Best for: AI Architect, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.