😺 OpenAI's GPT-Realtime-2 is coming for call center

2026-05-01 · Source: The Neuron · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, extended

Summary

OpenAI has released three new voice models, including GPT-Realtime-2, which integrates GPT-5 level reasoning with human-speed speech-to-speech capabilities, addressing the long-standing trade-off between conversational latency and intelligence in AI voice agents. This model achieved significant performance improvements, jumping from 81.4% to 96.6% on Big Bench Audio and 34.7% to 48.5% on Audio MultiChallenge compared to its predecessor, GPT-Realtime-1.5. It also features an expanded context window of 128K tokens, enabling it to handle extensive customer interaction histories. OpenAI cleverly mitigates perceived latency by having the model generate "preambles" or conversational fillers while complex reasoning occurs in the background. Alongside GPT-Realtime-2, two more cost-effective models, GPT-Realtime-Mini and Realtime-Nano, were introduced for high-volume support applications. This advancement is poised to transform customer service, drive-thrus, and other voice-based interactions, with early deployments by Zillow for voice search and Deutsche Telekom for live-translated support across 14 European markets.

Key takeaway

For AI Product Managers evaluating new voice agent solutions, OpenAI's GPT-Realtime-2 offers a significant leap in combining advanced reasoning with real-time conversational flow. You should consider piloting this model for high-volume customer interaction points, especially where previous AI solutions failed due to latency or lack of intelligence, but be mindful that default reasoning effort is "low" and may need explicit adjustment for optimal performance.

Key insights

OpenAI's new voice models combine GPT-5 reasoning with real-time speech, overcoming prior latency-intelligence trade-offs.

Principles

AI can mask processing delays with conversational fillers.
Context window size directly impacts AI's utility in complex interactions.

Method

OpenAI's voice models generate "preambles" (conversational fillers) to mask the time required for GPT-5 level reasoning, allowing for human-speed responses in speech-to-speech interactions.

In practice

Implement AI voice agents for customer support and scheduling.
Utilize expanded context windows for comprehensive interaction history.

Topics

GPT-Realtime-2
Voice AI
Call Center Automation
Natural Language Autoencoders
AI Safety

Code references

Best for: CTO, VP of Engineering/Data, AI Product Manager, Tech Journalist, Director of AI/ML, Consultant

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Neuron.