Jim Keller: ‘AI Still Obeys the Old Laws of Compute’

2026-06-25 · Source: Big Data & AI News - EE Times · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Tenstorrent CEO Jim Keller asserts that AI computation adheres to established "old laws of compute," emphasizing the critical balance of compute, memory, and I/O. Following its TT-Deploy event, Tenstorrent demonstrated its BlackHole Galaxy server's performance, claiming it can inference DeepSeek-671B at up to 350 tokens per second per user with 16 Galaxy servers (512 chips) at batch 32. Keller highlights Tenstorrent's architecture, featuring 56 Ethernet ports per box, as superior for splitting large tensors and managing KV caches on-chip, unlike disaggregated inference approaches. The company also offers a PCIe card to accelerate existing GPU deployments, doubling or tripling token rates for customers. Tenstorrent is building 1,000 Galaxy servers, with over half already sold, and aims for an IPO.

Key takeaway

For AI Architects evaluating large language model inference infrastructure, Tenstorrent's BlackHole Galaxy servers present a compelling alternative to GPU-centric or disaggregated solutions. Your team could achieve high token rates and potentially lower hardware costs by adopting Tenstorrent's integrated compute, memory, and I/O architecture. Consider piloting Galaxy servers for new deployments or using their PCIe cards to boost existing GPU clusters, especially if facing Nvidia's long lead times.

Key insights

AI inference performance hinges on balancing compute, memory, and I/O, not new computational laws.

Principles

Rent's Rule: I/O grows sub-linearly with logic.
Amdahl's Law applies to agentic computing.
Balance DRAM, SRAM, computation, and NoC.

Method

Tenstorrent's architecture splits large tensors across hundreds of chips via 56 Ethernet ports per box, integrating KV cache directly into DRAM on the same chips for fast decode.

In practice

Deploy Galaxy servers for LLM inference.
Use PCIe cards to accelerate existing GPUs.
Integrate RISC-V CPU IP for edge AI.

Topics

AI Inference
Tenstorrent BlackHole Galaxy
LLM Acceleration
Compute Architecture
Rent's Rule
Disaggregated Inference
RISC-V IP

Best for: Investor, CTO, VP of Engineering/Data, AI Hardware Engineer, AI Architect, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Big Data & AI News - EE Times.