OpenAI and Broadcom unveil LLM-optimized inference chip
Summary
OpenAI and Broadcom (NASDAQ: AVGO) unveiled Jalapeño on June 24, 2026, OpenAI's first Intelligence Processor designed specifically for LLM inference. This accelerator is the initial component in a multi-generation compute platform developed with Broadcom and Celestica, aiming to make advanced AI faster, more reliable, and more accessible. Jalapeño was architected from scratch based on OpenAI's deep understanding of LLM fundamentals, including its model roadmap and product needs. Early testing indicates it will deliver performance per watt substantially better than current state-of-the-art, achieved by optimizing data movement and balancing resources. The chip was developed from design to manufacturing tape-out in just nine months, partially accelerated by OpenAI's own models. It is slated for deployment at gigawatt scale with data center partners, beginning by the end of 2026.
Key takeaway
For Directors of AI/ML evaluating future infrastructure investments, OpenAI's Jalapeño chip signals a critical shift towards specialized hardware for LLM inference. You should prioritize solutions that offer superior performance per watt and are designed with full-stack optimization in mind. This approach can significantly reduce operational costs and enhance the reliability and speed of your interactive AI products, making advanced models more accessible and affordable for your users.
Key insights
OpenAI and Broadcom co-developed Jalapeño, a specialized LLM inference chip, to optimize AI infrastructure for efficiency and broader access.
Principles
- Full-stack control optimizes AI model performance.
- AI models can accelerate hardware design cycles.
- Specialized hardware improves LLM inference efficiency.
Method
Co-develop hardware and software, optimizing architecture for LLM kernels, memory, and networking to reduce data movement and balance resources.
In practice
- Design chips around specific LLM inference needs.
- Optimize for performance per watt in data centers.
- Integrate custom accelerators for interactive AI products.
Topics
- LLM Inference
- AI Accelerators
- Jalapeño Chip
- Broadcom
- Full-Stack AI
- Data Center Infrastructure
- Performance per Watt
Best for: Investor, CTO, VP of Engineering/Data, AI Hardware Engineer, AI Architect, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by OpenAI News.