OpenAI, Broadcom debut custom Jalapeño chip for AI inference
Summary
OpenAI Group PBC, in collaboration with Broadcom Inc., has unveiled a custom AI inference chip named Jalapeño, designed to power its large language models. This processor, which Broadcom helped develop, focuses solely on inference workloads, unlike Nvidia's Rubin graphics cards. Early testing suggests Jalapeño achieves significantly higher performance per watt compared to current leading solutions. While design specifics are limited, the architecture aims to reduce data movement, a common bottleneck, potentially through extensive onboard SRAM. OpenAI's Jalapeño-powered inference clusters will integrate Broadcom's networking technologies, including the Tomahawk 6 chip series, capable of processing up to 1.6 terabits of traffic per second. OpenAI plans to deploy these custom server racks, developed with Celestia Inc., by year-end, viewing Jalapeño as the initial step in a multi-generation compute platform. This initiative could open new revenue streams for OpenAI, potentially through selling Jalapeño-powered appliances, and enhance investor interest ahead of its public offering.
Key takeaway
For AI Architects evaluating infrastructure for large language models, you should consider the strategic advantages of specialized inference hardware. OpenAI's Jalapeño demonstrates that custom silicon can deliver superior performance per watt, potentially reducing operational costs significantly. This shift suggests a future where you might integrate purpose-built chips and high-speed networking, moving beyond general-purpose GPUs for specific AI workloads. Prepare to assess custom hardware solutions for your next-generation AI deployments.
Key insights
Custom AI inference chips, like OpenAI's Jalapeño, offer performance per watt gains and strategic market differentiation.
Principles
- Data movement reduction optimizes inference performance.
- Inference-specific chip design boosts efficiency.
- Custom hardware provides market differentiation.
In practice
- Evaluate custom silicon for inference workloads.
- Utilize high-speed networking in AI clusters.
- Investigate on-premises AI model deployment.
Topics
- OpenAI
- AI Inference Chips
- Custom Silicon
- Broadcom Networking
- Large Language Models
- Data Center Infrastructure
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Hardware Engineer, AI Architect, Investor
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI – SiliconANGLE.