GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs

· Source: AINews · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cybersecurity & Data Privacy · Depth: Advanced, extended

Summary

OpenAI has launched three new real-time voice APIs: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. GPT-Realtime-2 is positioned as OpenAI's most intelligent voice model, offering "GPT-5-class reasoning" for real-time voice agents, with capabilities like tool use, interruption handling, and longer conversations. It features an expanded 128K context window, up from 32K, and maintains audio pricing at $1.15/hour input and $4.61/hour output. Benchmarks show significant improvements, with GPT-Realtime-2 scoring 96.6% on Big Bench Audio and achieving 70.8% APR on Scale AI's Audio MultiChallenge S2S for instruction retention. GPT-Realtime-Translate supports live speech translation across 70+ input languages to 13 output languages, while GPT-Realtime-Whisper provides low-latency streaming transcription. These models are available in the Realtime API, with ChatGPT voice upgrades pending.

Key takeaway

For CTOs and VP of Engineering evaluating real-time voice agent solutions, OpenAI's new GPT-Realtime-2, -Translate, and -Whisper models represent a significant leap in capability. Your teams should explore integrating these APIs to build more intelligent, responsive, and context-aware voice applications, particularly for customer support, live translation, and hands-free workflows. Be prepared to design stateful real-time systems to fully capitalize on features like 128K context and advanced interruption handling.

Key insights

OpenAI's new Realtime API models significantly advance voice AI with enhanced reasoning, context, and real-time capabilities.

Principles

Method

OpenAI's voice models integrate adjustable reasoning effort, preambles, parallel tool calls, and robust recovery behaviors to manage complex, real-time voice interactions effectively.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, NLP Engineer, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AINews.