Introducing updated GPT Voice Models in Microsoft Foundry

· Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

Microsoft Foundry has released updated GPT voice models, now generally available via API, designed for production-ready real-time audio applications. These include "gpt-realtime-mini-2025-12-15" for low-latency voice agents with improved prosody and voice fidelity, supporting voice cloning for trusted customers. The "gpt-4o-mini-transcribe-2025-12-15" model offers up to 50% lower word error rate (WER) on English benchmarks and reduces hallucinations on silence by up to 4x. Additionally, "gpt-4o-mini-tts-2025-12-15" provides multilingual speech synthesis with 35% fewer word errors on multilingual benchmarks and up to 3x lower WER in non-English languages, also supporting voice cloning. All these enhancements are available without any changes to current pricing.

Key takeaway

For AI Architects and NLP Engineers building real-time voice applications, these updated GPT models in Microsoft Foundry offer significant performance gains in accuracy, latency, and multilingual support without increased cost. You should evaluate "gpt-realtime-mini-2025-12-15" for voice agents, "gpt-4o-mini-transcribe-2025-12-15" for transcription, and "gpt-4o-mini-tts-2025-12-15" for speech synthesis to enhance your systems and maintain brand voice consistency through features like voice cloning.

Key insights

Microsoft Foundry's new GPT voice models offer enhanced real-time performance, accuracy, and multilingual capabilities at existing price points.

Principles

In practice

Topics

Best for: AI Architect, NLP Engineer, CTO, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.