OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips
Summary
OpenAI has released Codex-Spark, a new AI coding agent developed in partnership with Cerebras, which runs on Cerebras' Wafer Scale Engine 3 (WSE3) chip. This collaboration marks OpenAI's first product utilizing Cerebras hardware, achieving 1,000 tokens per second for inference. This speed is notable, though Cerebras has reported higher rates of 2,100 tokens/second on Llama 3.1 70B and 3,000 tokens/second on gpt-oss-120B. The release of Codex-Spark is part of OpenAI's broader strategy to reduce its reliance on Nvidia, following a multi-year deal with AMD in October 2025, a $38 billion cloud computing agreement with Amazon in November, and ongoing efforts to design its own custom AI chip for TSMC fabrication. This diversification comes as a planned $100 billion Nvidia infrastructure deal has fizzled, with OpenAI reportedly dissatisfied with Nvidia chip speeds for inference tasks.
Key takeaway
For CTOs and VPs of Engineering evaluating AI infrastructure, OpenAI's move to diversify away from Nvidia with partners like Cerebras and AMD signals a critical shift in the AI hardware landscape. You should assess your own vendor lock-in risks and explore alternative compute providers to optimize for inference speed and cost, especially for latency-sensitive applications like AI coding agents. This trend suggests a more competitive and varied hardware ecosystem is emerging.
Key insights
OpenAI is diversifying its hardware dependencies for AI inference, partnering with Cerebras and AMD to reduce Nvidia reliance.
Principles
- Latency is critical for AI coding agent adoption.
- Hardware diversification mitigates vendor lock-in.
- Inference speed can be prioritized over raw accuracy.
In practice
- Evaluate Cerebras WSE3 for high-speed AI inference.
- Consider AMD chips for large-scale AI compute deals.
- Explore custom AI chip design for strategic independence.
Topics
- AI Coding Agents
- OpenAI Codex
- Cerebras Wafer Scale Engine
- AI Hardware Diversification
- LLM Inference Performance
Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.