Our eighth generation TPUs: two chips for the agentic era
Summary
Google has introduced its eighth generation of custom Tensor Processor Units (TPUs), featuring two specialized chips: the TPU 8t and the TPU 8i. The TPU 8t is designed for massive, compute-intensive AI model training, offering nearly 3x the compute performance per pod over the previous generation and scaling to 9,600 chips with 121 ExaFlops of compute. The TPU 8i is optimized for low-latency inference workloads, crucial for AI agents, and delivers 80% better performance-per-dollar than its predecessor. Both chips are engineered for high power efficiency, achieving up to two times better performance-per-watt, and are supported by Google's Axion ARM-based CPUs and fourth-generation liquid cooling technology. These TPUs will be generally available later this year.
Key takeaway
For MLOps Engineers and CTOs deploying or developing large-scale AI, Google's new TPU 8t and 8i offer specialized hardware for training and inference, respectively. You should evaluate these chips for their potential to significantly reduce model development cycles and improve inference performance-per-dollar, especially for agentic AI workloads. Consider requesting more information now to prepare for their general availability later this year.
Key insights
Google's new TPU 8t and 8i chips specialize in AI training and inference for the agentic era.
Principles
- Specialization unlocks significant efficiencies.
- Co-designing silicon, hardware, and software maximizes performance.
- System-level power efficiency is critical for large-scale AI.
Method
Google's co-design approach integrates custom silicon, networking, and software, including model architecture, to optimize power efficiency and performance across the entire AI supercomputing stack.
In practice
- TPU 8t offers 121 ExaFlops for complex model training.
- TPU 8i provides 288 GB HBM for latency-sensitive inference.
- Both support JAX, PyTorch, SGLang, and vLLM frameworks.
Topics
- TPU 8t
- TPU 8i
- AI Agents
- Machine Learning Infrastructure
- Power Efficiency
Best for: CTO, MLOps Engineer, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Keyword.