OpenAI, Broadcom debut custom Jalapeño chip for AI inference

2026-06-24 · Source: AI – SiliconANGLE · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, quick

Summary

OpenAI Group PBC, in collaboration with Broadcom Inc., has unveiled a custom AI inference chip named Jalapeño, designed to power its large language models. This processor, which Broadcom helped develop, focuses solely on inference workloads, unlike Nvidia's Rubin graphics cards. Early testing suggests Jalapeño achieves significantly higher performance per watt compared to current leading solutions. While design specifics are limited, the architecture aims to reduce data movement, a common bottleneck, potentially through extensive onboard SRAM. OpenAI's Jalapeño-powered inference clusters will integrate Broadcom's networking technologies, including the Tomahawk 6 chip series, capable of processing up to 1.6 terabits of traffic per second. OpenAI plans to deploy these custom server racks, developed with Celestia Inc., by year-end, viewing Jalapeño as the initial step in a multi-generation compute platform. This initiative could open new revenue streams for OpenAI, potentially through selling Jalapeño-powered appliances, and enhance investor interest ahead of its public offering.

Key takeaway

For AI Architects evaluating infrastructure for large language models, you should consider the strategic advantages of specialized inference hardware. OpenAI's Jalapeño demonstrates that custom silicon can deliver superior performance per watt, potentially reducing operational costs significantly. This shift suggests a future where you might integrate purpose-built chips and high-speed networking, moving beyond general-purpose GPUs for specific AI workloads. Prepare to assess custom hardware solutions for your next-generation AI deployments.

Key insights

Custom AI inference chips, like OpenAI's Jalapeño, offer performance per watt gains and strategic market differentiation.

Principles

Data movement reduction optimizes inference performance.
Inference-specific chip design boosts efficiency.
Custom hardware provides market differentiation.

In practice

Evaluate custom silicon for inference workloads.
Utilize high-speed networking in AI clusters.
Investigate on-premises AI model deployment.

Topics

OpenAI
AI Inference Chips
Custom Silicon
Broadcom Networking
Large Language Models
Data Center Infrastructure

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Hardware Engineer, AI Architect, Investor

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI – SiliconANGLE.