Inference is giving AI chip startups a second chance to make their mark

2026-05-03 · Source: The Register: Enterprise Technology News and Analysis · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, short

Summary

The AI industry is shifting focus from model training to inference, creating new opportunities for chip startups to challenge Nvidia's dominance. Inference workloads are diverse, allowing specialized hardware to excel in specific tasks. Nvidia's $20 billion acquihire of Groq in December exemplifies this trend, combining Groq's LPUs for bandwidth-constrained decode operations with Nvidia GPUs for compute-heavy prefill. AWS and Intel are also adopting disaggregated compute platforms, pairing their accelerators with Cerebras Systems' wafer-scale accelerators and SambaNova's RDUs, respectively. Lumai recently detailed its optical inference accelerator, which uses light for matrix multiplication, aiming for an exaOPS of AI performance within a 10kW power budget by 2029. Tenstorrent, however, is pursuing a unified RISC-V-based approach with its Galaxy Blackhole platforms, rejecting the disaggregated inference model.

Key takeaway

For CTOs evaluating AI infrastructure, the increasing heterogeneity of inference workloads suggests a strategic shift towards disaggregated compute architectures. You should assess whether specialized accelerators for decode operations, like those from Groq or Cerebras, can offer significant performance or power efficiency gains over a purely GPU-centric approach. Additionally, monitor emerging technologies such as Lumai's optical accelerators for future compute-bound inference needs.

Key insights

The shift to AI inference creates diverse hardware opportunities, favoring specialized and disaggregated compute architectures.

Principles

Inference workloads are highly heterogeneous.
Disaggregated compute optimizes specific inference stages.

Method

Disaggregated inference pipelines assign compute-heavy prefill to GPUs/Trainium and bandwidth-constrained decode to specialized accelerators like LPUs or wafer-scale chips.

In practice

Evaluate specialized accelerators for decode operations.
Consider optical computing for power-efficient inference.

Topics

AI Inference
AI Chip Startups
Disaggregated AI Architectures
Optical Inference Accelerators
RISC-V Compute Platforms

Best for: Investor, CTO, VP of Engineering/Data, AI Hardware Engineer, AI Architect, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Register: Enterprise Technology News and Analysis.