Tenstorrent Unveils Next-Gen Servers for Fast Tokens, No Disaggregation Needed

2026-04-28 · Source: Big Data & AI News - EE Times · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Robotics & Autonomous Systems · Depth: Advanced, medium

Summary

Tenstorrent is launching its Galaxy Blackhole server and cluster offering, designed for efficient AI inference without requiring disaggregated hardware. The 6U servers feature 32 Tenstorrent Blackhole chips, delivering 23 PFLOPS (Block FP8) and handling both prefill and decode stages. Galaxy Blackhole clusters support rapid token generation, including a "Blitz Mode" for DeepSeek-671B inference achieving up to 350 tokens per second per user with sub-4-second time to first token. Each server provides 6.2 GB on-chip SRAM, 1 TB DRAM, and up to 56 x 800G Ethernet ports for scale-out. Tenstorrent emphasizes its general-purpose AI computing approach, contrasting with specialized hardware trends, and has developed a complex software stack with a compiler achieving 80-90% run rates for Hugging Face models. The company is partnering with Equinix for its Distributed AI Hub and has secured customers like Turiam, Cirrascale, AI&, and financial services institutions.

Key takeaway

For CTOs and VPs of Engineering evaluating AI infrastructure, Tenstorrent's Galaxy Blackhole servers present a compelling alternative to disaggregated systems. Their integrated approach for prefill and decode, coupled with high token generation speeds and a general-purpose architecture, could significantly reduce complexity and cost per token. You should consider piloting these systems for LLM and video generation workloads, especially if your organization values hardware versatility and simplified deployment over specialized, single-purpose solutions.

Key insights

Tenstorrent's Galaxy Blackhole servers offer integrated, general-purpose AI inference for fast token generation without hardware disaggregation.

Principles

General-purpose AI hardware adapts better to evolving models.
Integrated prefill and decode simplifies infrastructure.
Software stack optimization is crucial for hardware versatility.

Method

Tenstorrent uses a highly networked cluster of medium-performance chips with a sophisticated software stack, including a custom compiler and TTLang, to enable general-purpose AI inference and rapid model deployment.

In practice

Deploy Galaxy Blackhole for fast LLM token generation.
Utilize Blitz Mode for high-speed code generation.
Integrate with Equinix's AI Hub for enterprise solutions.

Topics

Tenstorrent Galaxy Blackhole
AI Inference Servers
Token Generation
General-Purpose AI
DeepSeek-671B

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, AI Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Big Data & AI News - EE Times.