How OpenAI delivers low-latency voice AI at scale

· Source: OpenAI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Software Development & Engineering · Depth: Advanced, long

Summary

OpenAI has rearchitected its WebRTC stack to deliver low-latency voice AI at scale for over 900 million weekly active users, addressing challenges like global reach, fast connection setup, and stable media round-trip time. The new "relay plus transceiver" architecture preserves standard WebRTC client behavior while optimizing internal packet routing. This design tackles constraints such as one-port-per-session media termination incompatibility with Kubernetes, the need for stable ownership of stateful ICE and DTLS sessions, and the requirement for low first-hop latency in global routing. Instead of an SFU model, OpenAI adopted a transceiver model where an edge service terminates client connections and converts media into simpler internal protocols for backend inference, allowing for continuous audio streaming and conversational AI experiences.

Key takeaway

For AI Engineers building real-time voice applications, adopting a split relay-transceiver WebRTC architecture can significantly improve scalability and reduce latency, especially when deploying on Kubernetes. This approach allows for a small, fixed public UDP footprint and deterministic first-packet routing, which is crucial for maintaining conversational fluidity and global reach. Consider encoding routing metadata into protocol-native fields like the ICE ufrag to simplify routing logic and enhance performance.

Key insights

OpenAI's WebRTC rearchitecture uses a split relay-transceiver model for scalable, low-latency voice AI.

Principles

Method

The architecture splits packet routing (relay) from protocol termination (transceiver). The relay uses the ICE ufrag for first-packet routing to the owning transceiver, which handles all WebRTC session state.

In practice

Topics

Code references

Best for: AI Engineer, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by OpenAI News.