Our eighth generation TPUs: two chips for the agentic era

2026-04-22 · Source: The Keyword · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Robotics & Autonomous Systems · Depth: Advanced, long

Summary

Google has introduced its eighth generation of custom Tensor Processor Units (TPUs), featuring two specialized chips: the TPU 8t and the TPU 8i. The TPU 8t is designed for massive, compute-intensive AI model training, offering nearly 3x the compute performance per pod over the previous generation and scaling to 9,600 chips with 121 ExaFlops of compute. The TPU 8i is optimized for low-latency inference workloads, crucial for AI agents, and delivers 80% better performance-per-dollar than its predecessor. Both chips are engineered for high power efficiency, achieving up to two times better performance-per-watt, and are supported by Google's Axion ARM-based CPUs and fourth-generation liquid cooling technology. These TPUs will be generally available later this year.

Key takeaway

For MLOps Engineers and CTOs deploying or developing large-scale AI, Google's new TPU 8t and 8i offer specialized hardware for training and inference, respectively. You should evaluate these chips for their potential to significantly reduce model development cycles and improve inference performance-per-dollar, especially for agentic AI workloads. Consider requesting more information now to prepare for their general availability later this year.

Key insights

Google's new TPU 8t and 8i chips specialize in AI training and inference for the agentic era.

Principles

Specialization unlocks significant efficiencies.
Co-designing silicon, hardware, and software maximizes performance.
System-level power efficiency is critical for large-scale AI.

Method

Google's co-design approach integrates custom silicon, networking, and software, including model architecture, to optimize power efficiency and performance across the entire AI supercomputing stack.

In practice

TPU 8t offers 121 ExaFlops for complex model training.
TPU 8i provides 288 GB HBM for latency-sensitive inference.
Both support JAX, PyTorch, SGLang, and vLLM frameworks.

Topics

TPU 8t
TPU 8i
AI Agents
Machine Learning Infrastructure
Power Efficiency

Best for: CTO, MLOps Engineer, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Keyword.