Introducing updated GPT Voice Models in Microsoft Foundry

2025-12-17 · Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

Microsoft Foundry has released updated GPT voice models, now generally available via API, designed for production-ready real-time audio applications. These include "gpt-realtime-mini-2025-12-15" for low-latency voice agents with improved prosody and voice fidelity, supporting voice cloning for trusted customers. The "gpt-4o-mini-transcribe-2025-12-15" model offers up to 50% lower word error rate (WER) on English benchmarks and reduces hallucinations on silence by up to 4x. Additionally, "gpt-4o-mini-tts-2025-12-15" provides multilingual speech synthesis with 35% fewer word errors on multilingual benchmarks and up to 3x lower WER in non-English languages, also supporting voice cloning. All these enhancements are available without any changes to current pricing.

Key takeaway

For AI Architects and NLP Engineers building real-time voice applications, these updated GPT models in Microsoft Foundry offer significant performance gains in accuracy, latency, and multilingual support without increased cost. You should evaluate "gpt-realtime-mini-2025-12-15" for voice agents, "gpt-4o-mini-transcribe-2025-12-15" for transcription, and "gpt-4o-mini-tts-2025-12-15" for speech synthesis to enhance your systems and maintain brand voice consistency through features like voice cloning.

Key insights

Microsoft Foundry's new GPT voice models offer enhanced real-time performance, accuracy, and multilingual capabilities at existing price points.

Principles

Reliability and latency are critical for production voice models.
Voice cloning ensures consistent brand voice.
Cost-efficiency is maintained despite performance upgrades.

In practice

Use "gpt-realtime-mini" for low-latency voice agents.
Employ "gpt-4o-mini-transcribe" for robust real-time transcription.
Apply "gpt-4o-mini-tts" for natural multilingual speech synthesis.

Topics

GPT Voice Models
Speech Synthesis
Speech Transcription
Voice Cloning
Microsoft Foundry API

Best for: AI Architect, NLP Engineer, CTO, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.