OpenAI unveils its first custom chip, built by Broadcom
Summary
OpenAI has unveiled "Jalapeño," its first custom-built inference processor, developed in collaboration with Broadcom over the past 18 months. Designed specifically for OpenAI's inference systems, the chip's development was assisted by OpenAI's own AI models. Early testing indicates significantly better performance-per-watt compared to existing alternatives. This initiative aims to reduce OpenAI's reliance on Nvidia GPUs, following similar efforts by Google and Amazon. Jalapeño focuses on low operating costs for real-time coding models, though pre-training may still use Nvidia hardware. OpenAI plans to deploy 10 gigawatts of these new systems starting late next year, with rapid expansion over three years, aiming for a total capacity near 30 gigawatts. The company emphasizes optimizing the entire AI stack, from chip architecture to user experience, to make models faster, more reliable, and more affordable, ultimately driving compute abundance.
Key takeaway
For Directors of AI/ML scaling large language model services, OpenAI's Jalapeño chip highlights the critical shift towards custom silicon for inference. Your strategy should evaluate vertical integration opportunities and partnerships for purpose-built AI accelerators to significantly reduce operational costs and improve performance-per-watt. This approach is essential to meet escalating demand and achieve compute abundance, moving beyond reliance on general-purpose GPUs for specific high-volume workloads.
Key insights
OpenAI's custom chip, Jalapeño, signifies a strategic vertical integration to optimize AI inference costs and scale.
Principles
- Vertical integration across the AI stack drives substantial efficiency.
- Custom silicon targets specific workloads for cost-effective AI deployment.
- AI models can accelerate and optimize hardware design processes.
Method
Design custom inference chips with partners like Broadcom, applying AI models for optimization, and integrating across the full system stack including networking and algorithms.
In practice
- Prioritize custom silicon for high-volume inference workloads to reduce operational costs.
- Investigate AI-driven design tools to shorten hardware development cycles.
- Scale compute infrastructure to multi-gigawatt levels for future AI service demand.
Topics
- OpenAI
- Broadcom
- Jalapeño
- Inference Chips
- Vertical Integration
- AI Infrastructure
Best for: Investor, VP of Engineering/Data, AI Architect, AI Hardware Engineer, Director of AI/ML, CTO
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI News & Artificial Intelligence | TechCrunch.