Jim Keller: ‘AI Still Obeys the Old Laws of Compute’
Summary
Tenstorrent CEO Jim Keller asserts that AI computation adheres to established "old laws of compute," emphasizing the critical balance of compute, memory, and I/O. Following its TT-Deploy event, Tenstorrent demonstrated its BlackHole Galaxy server's performance, claiming it can inference DeepSeek-671B at up to 350 tokens per second per user with 16 Galaxy servers (512 chips) at batch 32. Keller highlights Tenstorrent's architecture, featuring 56 Ethernet ports per box, as superior for splitting large tensors and managing KV caches on-chip, unlike disaggregated inference approaches. The company also offers a PCIe card to accelerate existing GPU deployments, doubling or tripling token rates for customers. Tenstorrent is building 1,000 Galaxy servers, with over half already sold, and aims for an IPO.
Key takeaway
For AI Architects evaluating large language model inference infrastructure, Tenstorrent's BlackHole Galaxy servers present a compelling alternative to GPU-centric or disaggregated solutions. Your team could achieve high token rates and potentially lower hardware costs by adopting Tenstorrent's integrated compute, memory, and I/O architecture. Consider piloting Galaxy servers for new deployments or using their PCIe cards to boost existing GPU clusters, especially if facing Nvidia's long lead times.
Key insights
AI inference performance hinges on balancing compute, memory, and I/O, not new computational laws.
Principles
- Rent's Rule: I/O grows sub-linearly with logic.
- Amdahl's Law applies to agentic computing.
- Balance DRAM, SRAM, computation, and NoC.
Method
Tenstorrent's architecture splits large tensors across hundreds of chips via 56 Ethernet ports per box, integrating KV cache directly into DRAM on the same chips for fast decode.
In practice
- Deploy Galaxy servers for LLM inference.
- Use PCIe cards to accelerate existing GPUs.
- Integrate RISC-V CPU IP for edge AI.
Topics
- AI Inference
- Tenstorrent BlackHole Galaxy
- LLM Acceleration
- Compute Architecture
- Rent's Rule
- Disaggregated Inference
- RISC-V IP
Best for: Investor, CTO, VP of Engineering/Data, AI Hardware Engineer, AI Architect, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Big Data & AI News - EE Times.