OpenAI Releases a Research Preview of GPT‑5.3-Codex-Spark: A 15x Faster AI Coding Model Delivering Over 1000 Tokens Per Second on Cerebras Hardware

2026-02-12 · Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, quick

Summary

OpenAI has released GPT-5.3 Codex-Spark, a research preview of an AI coding model that achieves over 1000 tokens per second, representing a 15x speed increase compared to its flagship model. This significant performance enhancement is attributed to its deployment on the Cerebras Wafer-Scale Engine 3 (WSE-3), which mitigates conventional GPU limitations by consolidating all computation onto a single silicon wafer. Additionally, the model utilizes a new persistent WebSocket connection, which effectively reduces networking overhead by 80%. This combination of specialized hardware and optimized communication protocols enables near-instantaneous code generation.

Key takeaway

For engineering leaders evaluating AI coding assistants, GPT-5.3 Codex-Spark's 15x speed improvement and 1000+ tokens/second on Cerebras hardware indicate a shift towards near-instantaneous code generation. You should consider how such high-throughput models could integrate into your development workflows to enhance developer productivity and reduce iteration cycles, potentially justifying investment in specialized AI accelerators.

Key insights

Specialized hardware and optimized connections dramatically accelerate AI coding model inference.

Principles

Wafer-scale engines eliminate GPU bottlenecks.
Persistent connections reduce network overhead.

Method

GPT-5.3 Codex-Spark achieves high speed by running on Cerebras WSE-3 and using a persistent WebSocket connection to minimize latency.

In practice

Explore Cerebras WSE-3 for AI inference.
Implement persistent connections for low-latency AI.

Topics

GPT-5.3 Codex-Spark
AI Code Generation
Cerebras WSE-3
AI Model Performance
Wafer-Scale Engine

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.