AWS Replaces Fat-Tree Data Center Networks with Random Graph Theory, Cutting Routers by 69%
Summary
AWS has adopted Resilient Network Graphs (RNG), a flat network architecture based on quasi-random graph theory, as the default for most new non-GPU data center builds globally as of April 2026. This marks the first large-scale production deployment of expander-based network fabrics, replacing traditional fat-tree topologies. The RNG approach, detailed in an arXiv paper, achieves 69% fewer networking devices, up to 33% higher throughput, and a projected 40% reduction in network equipment power consumption. It eliminates hierarchical spine and leaf layers, connecting top-of-rack switches directly via passive optical ShuffleBoxes for quasi-random physical links and using a custom Spraypoint protocol for distributed routing. This design ensures proportional network degradation rather than catastrophic failure. Validation included 530 processor-years of simulation, with initial deployments in Ireland, Germany, and Spain starting in late 2024.
Key takeaway
For Network Architects evaluating next-generation data center designs, AWS's adoption of Resilient Network Graphs (RNG) demonstrates a viable path to significantly reduce hardware and power consumption. You should investigate expander-based network fabrics as an alternative to traditional fat-tree topologies, especially for general-purpose compute workloads. Consider how a flat, quasi-random network could improve your infrastructure's resilience and throughput, potentially cutting device count by over 60%.
Key insights
Random graph theory applied to data center networks significantly reduces hardware while improving resilience and throughput.
Principles
- Random connectivity optimizes network efficiency.
- Passive optical devices simplify complex physical topologies.
- Distributed routing enhances fault tolerance.
Method
AWS implemented RNG by replacing spine/leaf layers with a mesh of ToR switches connected via ShuffleBoxes for quasi-random links and using Spraypoint, a custom distributed routing protocol, to guide packets.
In practice
- Evaluate expander-based network fabrics for general compute.
- Consider passive optical devices for physical layer randomization.
- Design routing protocols for flat, distributed topologies.
Topics
- Resilient Network Graphs
- Data Center Networks
- Random Graph Theory
- Network Topology
- ShuffleBox
- Spraypoint
Best for: CTO, VP of Engineering/Data, MLOps Engineer, AI Architect, DevOps Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.