Advancing voice intelligence with new models in the API

2026-05-07 · Source: OpenAI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, long

Summary

OpenAI has released three new real-time voice models via its API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. GPT-Realtime-2 offers GPT-5-class reasoning for natural, complex conversations, with features like preambles, parallel tool calls, improved recovery, a 128K context window, stronger domain understanding, and adjustable reasoning effort (minimal, low, medium, high, xhigh). It shows a 15.2% increase in audio intelligence over GPT-Realtime-1.5 on Big Bench Audio. GPT-Realtime-Translate provides live speech translation for over 70 input and 13 output languages, while GPT-Realtime-Whisper offers low-latency streaming speech-to-text. These models aim to enable advanced voice-to-action, systems-to-voice, and voice-to-voice applications, with pricing at $32/1M input tokens and $64/1M output tokens for GPT-Realtime-2, $0.034/minute for Translate, and $0.017/minute for Whisper.

Key takeaway

For Machine Learning Engineers building conversational AI, OpenAI's new Realtime API models offer significant advancements in voice intelligence. You should explore GPT-Realtime-2 for agents requiring complex reasoning and tool use, GPT-Realtime-Translate for live multilingual applications, and GPT-Realtime-Whisper for low-latency transcription. Consider the adjustable reasoning levels of GPT-Realtime-2 to balance performance and cost for your specific use cases, and integrate the provided safety guardrails.

Key insights

OpenAI's new real-time voice models enable intelligent, natural, and actionable voice interfaces for diverse applications.

Principles

Voice agents require reasoning and context management.
Real-time processing enhances natural conversation flow.
Adjustable reasoning levels optimize latency and complexity.

Method

The models integrate reasoning, translation, and transcription capabilities, supporting features like parallel tool calls, context window expansion to 128K, and adjustable reasoning effort for dynamic voice interactions.

In practice

Use GPT-Realtime-2 for complex conversational agents.
Implement GPT-Realtime-Translate for live multilingual support.
Leverage GPT-Realtime-Whisper for low-latency transcription.

Topics

Realtime Voice Models
GPT-Realtime-2
GPT-Realtime-Translate
GPT-Realtime-Whisper
Voice AI API

Best for: Machine Learning Engineer, CTO, VP of Engineering/Data, AI Engineer, NLP Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by OpenAI News.