Google unveils two new TPUs designed for the "agentic era"

2026-04-22 · Source: AI - Ars Technica · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, medium

Summary

Google has unveiled its eighth-generation Tensor Processing Units (TPUs), the TPU 8t for training and the TPU 8i for inference, designed for the "agentic era" of AI. The TPU 8t is engineered to accelerate frontier AI model training, reducing timelines from months to weeks, featuring pods with 9600 chips, two petabytes of shared high-bandwidth memory, and 121 FP4 EFlops of compute per pod, nearly triple the previous Ironwood generation. It boasts a "goodpute" rate of 97% for efficient computation. The TPU 8i, optimized for inference, runs in larger pods of 1,152 chips, offers 11.6 EFlops per pod, and triples on-chip SRAM to 384 MB for longer context windows. Both new TPUs utilize Google's custom Axion ARM CPU host and are designed for enhanced power and cooling efficiency, claiming twice the performance per watt compared to Ironwood and six times more computing power per unit of electricity in co-designed data centers.

Key takeaway

For CTOs and VP of Engineering evaluating AI infrastructure, Google's new TPU 8t and 8i offer a specialized, efficient alternative to general-purpose accelerators. Your teams should consider these TPUs for accelerating both the training of large-scale frontier models and the efficient deployment of agentic AI systems, potentially reducing operational costs and development timelines, especially if you are already invested in the Google Cloud ecosystem.

Key insights

Google's new dual-chip TPU architecture optimizes AI training and inference for the emerging "agentic era."

Principles

Specialized hardware improves AI lifecycle efficiency.
Linear scalability is crucial for frontier AI model training.
Full-stack ARM integration enhances system efficiency.

Method

Google's approach involves developing distinct TPU architectures (8t for training, 8i for inference) and integrating them with custom ARM CPUs and co-designed data centers for end-to-end efficiency.

In practice

Utilize TPU 8t for large-scale AI model training.
Deploy TPU 8i for efficient multi-agent inference workloads.
Leverage Google's custom Axion ARM CPU for host processing.

Topics

TPU 8t
TPU 8i
AI Model Training
AI Inference
Agentic Era

Best for: CTO, VP of Engineering/Data, MLOps Engineer, AI Hardware Engineer, AI Architect, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.