OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

2026-02-12 · Source: AI - Ars Technica · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, quick

Summary

OpenAI has released Codex-Spark, a new AI coding agent developed in partnership with Cerebras, which runs on Cerebras' Wafer Scale Engine 3 (WSE3) chip. This collaboration marks OpenAI's first product utilizing Cerebras hardware, achieving 1,000 tokens per second for inference. This speed is notable, though Cerebras has reported higher rates of 2,100 tokens/second on Llama 3.1 70B and 3,000 tokens/second on gpt-oss-120B. The release of Codex-Spark is part of OpenAI's broader strategy to reduce its reliance on Nvidia, following a multi-year deal with AMD in October 2025, a $38 billion cloud computing agreement with Amazon in November, and ongoing efforts to design its own custom AI chip for TSMC fabrication. This diversification comes as a planned $100 billion Nvidia infrastructure deal has fizzled, with OpenAI reportedly dissatisfied with Nvidia chip speeds for inference tasks.

Key takeaway

For CTOs and VPs of Engineering evaluating AI infrastructure, OpenAI's move to diversify away from Nvidia with partners like Cerebras and AMD signals a critical shift in the AI hardware landscape. You should assess your own vendor lock-in risks and explore alternative compute providers to optimize for inference speed and cost, especially for latency-sensitive applications like AI coding agents. This trend suggests a more competitive and varied hardware ecosystem is emerging.

Key insights

OpenAI is diversifying its hardware dependencies for AI inference, partnering with Cerebras and AMD to reduce Nvidia reliance.

Principles

Latency is critical for AI coding agent adoption.
Hardware diversification mitigates vendor lock-in.
Inference speed can be prioritized over raw accuracy.

In practice

Evaluate Cerebras WSE3 for high-speed AI inference.
Consider AMD chips for large-scale AI compute deals.
Explore custom AI chip design for strategic independence.

Topics

AI Coding Agents
OpenAI Codex
Cerebras Wafer Scale Engine
AI Hardware Diversification
LLM Inference Performance

Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.