AWS Replaces Fat-Tree Data Center Networks with Random Graph Theory, Cutting Routers by 69%

· Source: InfoQ · Field: Technology & Digital — Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

AWS has adopted Resilient Network Graphs (RNG), a flat network architecture based on quasi-random graph theory, as the default for most new non-GPU data center builds globally as of April 2026. This marks the first large-scale production deployment of expander-based network fabrics, replacing traditional fat-tree topologies. The RNG approach, detailed in an arXiv paper, achieves 69% fewer networking devices, up to 33% higher throughput, and a projected 40% reduction in network equipment power consumption. It eliminates hierarchical spine and leaf layers, connecting top-of-rack switches directly via passive optical ShuffleBoxes for quasi-random physical links and using a custom Spraypoint protocol for distributed routing. This design ensures proportional network degradation rather than catastrophic failure. Validation included 530 processor-years of simulation, with initial deployments in Ireland, Germany, and Spain starting in late 2024.

Key takeaway

For Network Architects evaluating next-generation data center designs, AWS's adoption of Resilient Network Graphs (RNG) demonstrates a viable path to significantly reduce hardware and power consumption. You should investigate expander-based network fabrics as an alternative to traditional fat-tree topologies, especially for general-purpose compute workloads. Consider how a flat, quasi-random network could improve your infrastructure's resilience and throughput, potentially cutting device count by over 60%.

Key insights

Random graph theory applied to data center networks significantly reduces hardware while improving resilience and throughput.

Principles

Method

AWS implemented RNG by replacing spine/leaf layers with a mesh of ToR switches connected via ShuffleBoxes for quasi-random links and using Spraypoint, a custom distributed routing protocol, to guide packets.

In practice

Topics

Best for: CTO, VP of Engineering/Data, MLOps Engineer, AI Architect, DevOps Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.