Powering the AI Inference Wave with EPRI's Ben Sooter - Ep. 292

2026-03-04 · Source: NVIDIA AI Podcast · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, AI Grid Integration · Depth: Intermediate, extended

Summary

EPRI's Director of R&D, Ben Sooter, discusses the evolving relationship between AI data centers and the electric grid, highlighting that approximately 80% of an AI model's lifetime energy consumption occurs during inference, not training. He introduces the concept of micro data centers, which are smaller, geographically dispersed facilities designed to handle inference loads closer to end-users. These micro data centers, typically 3-20 megawatts, can be strategically located near underutilized electrical substations to leverage existing infrastructure, enhance grid resilience, and reduce interconnection queues. This distributed approach aims to meet the growing demand for low-latency AI services while optimizing energy distribution and potentially integrating with clean energy solutions like solar, wind, and energy storage.

Key takeaway

For CTOs and VPs of Engineering planning AI infrastructure, recognize that inference will dominate your long-term energy costs and demand patterns. You should investigate distributed micro data center strategies, particularly co-locating with underutilized substations, to ensure low-latency service delivery and enhance grid stability, avoiding the bottlenecks and high costs associated with large, centralized training facilities.

Key insights

AI model inference consumes 80% of lifetime energy, necessitating distributed micro data centers for grid resilience and low-latency services.

Principles

Inference drives most AI energy demand.
Geographic proximity improves AI service latency.
Utilize existing grid infrastructure for efficiency.

Method

Deploy micro data centers (3-20 MW) near underutilized substations, distributing compute capacity across multiple sites to meet regional demand and optimize grid integration.

In practice

Consider micro data centers for latency-sensitive AI applications.
Explore substation co-location for new data center builds.
Integrate energy storage for demand flexibility.

Topics

AI Energy Demand
Micro Data Centers
AI Inference
Electric Grid Integration
Distributed Inference

Best for: CTO, VP of Engineering/Data, Director of AI/ML, Machine Learning Engineer, MLOps Engineer, AI Operations Specialist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA AI Podcast.