How flat is replacing fat in AWS data center networks

· Source: Amazon Science homepage · Field: Technology & Digital — Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation, Networking · Depth: Advanced, long

Summary

AWS is transitioning its data center networks from traditional "fat-tree" architectures to a more efficient "flat" design, leveraging "quasi-random" network topologies and a novel passive optical component called ShuffleBox. This new approach, named RNG (resilient network graphs), became the default for most new AWS data center builds globally by April 2026. The fat-tree design, while simple, suffers from inefficiency, congestion, and fragility. While purely random networks offer superior routing and resilience, their implementation is impractical due to complex cabling and high computational demands for routing. AWS's solution combines random and deterministic elements, employing the Spraypoint routing algorithm to enable lightweight multipath routing and ShuffleBoxes to simplify physical cabling. This shift results in a 69% reduction in routers, up to 33% better throughput, and a projected 40% decrease in network equipment electricity consumption. Mathematical models, validated with 530 processor-years of simulation, ensure predictable performance.

Key takeaway

For AI Architects and DevOps Engineers designing or optimizing cloud infrastructure, AWS's adoption of RNG flat networks signals a critical shift. You should evaluate how quasi-random topologies and passive optical components like ShuffleBoxes can reduce router count by 69% and improve throughput by 33% in your own data center designs. Consider leveraging similar principles to enhance network resilience and reduce operational costs, ensuring your infrastructure supports demanding workloads transparently.

Key insights

AWS's RNG flat network design, using quasi-random topologies and ShuffleBoxes, significantly improves data center efficiency and resilience.

Principles

Method

The Spraypoint routing algorithm "sprays" traffic randomly to neighbors, then uses shortest-path routing to designated "waypoints" arranged in rings around destinations, guiding traffic to closer rings.

In practice

Topics

Best for: CTO, VP of Engineering/Data, MLOps Engineer, AI Architect, DevOps Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Amazon Science homepage.