Inference is giving AI chip startups a second chance to make their mark
Summary
The AI industry is shifting focus from model training to inference, creating new opportunities for chip startups to challenge Nvidia's dominance. Inference workloads are diverse, allowing specialized hardware to excel in specific tasks. Nvidia's $20 billion acquihire of Groq in December exemplifies this trend, combining Groq's LPUs for bandwidth-constrained decode operations with Nvidia GPUs for compute-heavy prefill. AWS and Intel are also adopting disaggregated compute platforms, pairing their accelerators with Cerebras Systems' wafer-scale accelerators and SambaNova's RDUs, respectively. Lumai recently detailed its optical inference accelerator, which uses light for matrix multiplication, aiming for an exaOPS of AI performance within a 10kW power budget by 2029. Tenstorrent, however, is pursuing a unified RISC-V-based approach with its Galaxy Blackhole platforms, rejecting the disaggregated inference model.
Key takeaway
For CTOs evaluating AI infrastructure, the increasing heterogeneity of inference workloads suggests a strategic shift towards disaggregated compute architectures. You should assess whether specialized accelerators for decode operations, like those from Groq or Cerebras, can offer significant performance or power efficiency gains over a purely GPU-centric approach. Additionally, monitor emerging technologies such as Lumai's optical accelerators for future compute-bound inference needs.
Key insights
The shift to AI inference creates diverse hardware opportunities, favoring specialized and disaggregated compute architectures.
Principles
- Inference workloads are highly heterogeneous.
- Disaggregated compute optimizes specific inference stages.
Method
Disaggregated inference pipelines assign compute-heavy prefill to GPUs/Trainium and bandwidth-constrained decode to specialized accelerators like LPUs or wafer-scale chips.
In practice
- Evaluate specialized accelerators for decode operations.
- Consider optical computing for power-efficient inference.
Topics
- AI Inference
- AI Chip Startups
- Disaggregated AI Architectures
- Optical Inference Accelerators
- RISC-V Compute Platforms
Best for: Investor, CTO, VP of Engineering/Data, AI Hardware Engineer, AI Architect, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Register: Enterprise Technology News and Analysis.