Inference is giving AI chip startups a second chance to make their mark

· Source: The Register: Enterprise Technology News and Analysis · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, short

Summary

The AI industry is shifting focus from model training to inference, creating new opportunities for chip startups to challenge Nvidia's dominance. Inference workloads are diverse, allowing specialized hardware to excel in specific tasks. Nvidia's $20 billion acquihire of Groq in December exemplifies this trend, combining Groq's LPUs for bandwidth-constrained decode operations with Nvidia GPUs for compute-heavy prefill. AWS and Intel are also adopting disaggregated compute platforms, pairing their accelerators with Cerebras Systems' wafer-scale accelerators and SambaNova's RDUs, respectively. Lumai recently detailed its optical inference accelerator, which uses light for matrix multiplication, aiming for an exaOPS of AI performance within a 10kW power budget by 2029. Tenstorrent, however, is pursuing a unified RISC-V-based approach with its Galaxy Blackhole platforms, rejecting the disaggregated inference model.

Key takeaway

For CTOs evaluating AI infrastructure, the increasing heterogeneity of inference workloads suggests a strategic shift towards disaggregated compute architectures. You should assess whether specialized accelerators for decode operations, like those from Groq or Cerebras, can offer significant performance or power efficiency gains over a purely GPU-centric approach. Additionally, monitor emerging technologies such as Lumai's optical accelerators for future compute-bound inference needs.

Key insights

The shift to AI inference creates diverse hardware opportunities, favoring specialized and disaggregated compute architectures.

Principles

Method

Disaggregated inference pipelines assign compute-heavy prefill to GPUs/Trainium and bandwidth-constrained decode to specialized accelerators like LPUs or wafer-scale chips.

In practice

Topics

Best for: Investor, CTO, VP of Engineering/Data, AI Hardware Engineer, AI Architect, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Register: Enterprise Technology News and Analysis.