How flat is replacing fat in AWS data center networks
Summary
AWS is transitioning its data center networks from traditional "fat-tree" architectures to a more efficient "flat" design, leveraging "quasi-random" network topologies and a novel passive optical component called ShuffleBox. This new approach, named RNG (resilient network graphs), became the default for most new AWS data center builds globally by April 2026. The fat-tree design, while simple, suffers from inefficiency, congestion, and fragility. While purely random networks offer superior routing and resilience, their implementation is impractical due to complex cabling and high computational demands for routing. AWS's solution combines random and deterministic elements, employing the Spraypoint routing algorithm to enable lightweight multipath routing and ShuffleBoxes to simplify physical cabling. This shift results in a 69% reduction in routers, up to 33% better throughput, and a projected 40% decrease in network equipment electricity consumption. Mathematical models, validated with 530 processor-years of simulation, ensure predictable performance.
Key takeaway
For AI Architects and DevOps Engineers designing or optimizing cloud infrastructure, AWS's adoption of RNG flat networks signals a critical shift. You should evaluate how quasi-random topologies and passive optical components like ShuffleBoxes can reduce router count by 69% and improve throughput by 33% in your own data center designs. Consider leveraging similar principles to enhance network resilience and reduce operational costs, ensuring your infrastructure supports demanding workloads transparently.
Key insights
AWS's RNG flat network design, using quasi-random topologies and ShuffleBoxes, significantly improves data center efficiency and resilience.
Principles
- Optimal networks have random topologies for diverse paths and resilience.
- Flat networks reduce router overhead and congestion compared to fat trees.
- Combining random and deterministic elements can overcome practical limitations.
Method
The Spraypoint routing algorithm "sprays" traffic randomly to neighbors, then uses shortest-path routing to designated "waypoints" arranged in rings around destinations, guiding traffic to closer rings.
In practice
- Implement quasi-random topologies for improved network resilience.
- Utilize passive optical components like ShuffleBoxes for practical flat network cabling.
- Develop mathematical models to predict network performance pre-construction.
Topics
- Data Center Networking
- Flat Network Topologies
- Quasi-Random Graphs
- ShuffleBox
- Spraypoint Algorithm
- AWS Infrastructure
Best for: CTO, VP of Engineering/Data, MLOps Engineer, AI Architect, DevOps Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Amazon Science homepage.