OpenAI has yet another new coding model and this time it's really fast

· Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

OpenAI has launched GPT-5.3-Codex-Spark, a compact coding model optimized for real-time programming, capable of generating over 1,000 tokens per second. This model, the first product from OpenAI's partnership with Cerebras, runs on Cerebras' Wafer Scale Engine 3 AI accelerator. Available as a Research Preview for ChatGPT Pro users via the Codex app, CLI, and VS Code extension, Codex-Spark prioritizes speed and interactivity over autonomous operation. While it achieves 58.4% accuracy on Terminal-Bench 2.0 compared to 77.3% for the larger GPT-5.3-Codex, it completes tasks on SWE-Bench Pro in two to three minutes, significantly faster than the 15 to 17 minutes required by GPT-5.3-Codex. OpenAI also optimized its inference stack, reducing per-roundtrip overhead by 80% and time-to-first-token by 50%.

Key takeaway

For MLOps engineers and developers building real-time coding tools, GPT-5.3-Codex-Spark offers a significant speed advantage, enabling immediate feedback and iterative development. You should consider integrating this model for interactive programming environments where low latency is critical, even if it means a slight trade-off in accuracy compared to larger, more autonomous models. Explore its capabilities via the ChatGPT Pro Codex app, CLI, or VS Code extension.

Key insights

OpenAI's new Codex-Spark model prioritizes real-time coding speed over raw accuracy using specialized Cerebras hardware.

Principles

Method

OpenAI optimized its inference stack, streamlined client-server streaming, and reworked session startup to achieve high-speed, low-latency model responses.

In practice

Topics

Best for: MLOps Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.