OpenAI has yet another new coding model and this time it's really fast

2026-02-12 · Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

OpenAI has launched GPT-5.3-Codex-Spark, a compact coding model optimized for real-time programming, capable of generating over 1,000 tokens per second. This model, the first product from OpenAI's partnership with Cerebras, runs on Cerebras' Wafer Scale Engine 3 AI accelerator. Available as a Research Preview for ChatGPT Pro users via the Codex app, CLI, and VS Code extension, Codex-Spark prioritizes speed and interactivity over autonomous operation. While it achieves 58.4% accuracy on Terminal-Bench 2.0 compared to 77.3% for the larger GPT-5.3-Codex, it completes tasks on SWE-Bench Pro in two to three minutes, significantly faster than the 15 to 17 minutes required by GPT-5.3-Codex. OpenAI also optimized its inference stack, reducing per-roundtrip overhead by 80% and time-to-first-token by 50%.

Key takeaway

For MLOps engineers and developers building real-time coding tools, GPT-5.3-Codex-Spark offers a significant speed advantage, enabling immediate feedback and iterative development. You should consider integrating this model for interactive programming environments where low latency is critical, even if it means a slight trade-off in accuracy compared to larger, more autonomous models. Explore its capabilities via the ChatGPT Pro Codex app, CLI, or VS Code extension.

Key insights

OpenAI's new Codex-Spark model prioritizes real-time coding speed over raw accuracy using specialized Cerebras hardware.

Principles

Latency matters as much as intelligence for interactive work.
Smaller models can trade precision for speed effectively.

Method

OpenAI optimized its inference stack, streamlined client-server streaming, and reworked session startup to achieve high-speed, low-latency model responses.

In practice

Utilize Codex-Spark for interactive coding tasks.
Expect separate rate limits due to specialized hardware.

Topics

GPT-5.3-Codex-Spark
Real-time Code Generation
AI Inference Optimization
Cerebras Wafer Scale Engine
Coding Benchmarks

Best for: MLOps Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.