Unbox one of NVIDIA's first co-packaged optics switches with us. See why we bet on CPO early.
Summary
NVIDIA's Quantum-X InfiniBand Photonics Q3450-LD switch, featuring co-packaged optics (CPO), represents a significant shift in large GPU cluster networking, particularly for 800G and NVIDIA GB300 NVL72 scale deployments. This 4U, liquid-cooled switch, powered by a 48V DC busbar and an NVIDIA Quantum-X800 ASIC, offers 144 x 800G InfiniBand ports and 115.2 Tb/s non-blocking switching capacity. CPO reduces networking power consumption, freeing up significant power for GPUs; for example, a 41,472-GPU cluster could gain 3,137 power-equivalent GPUs. It also enhances reliability by eliminating 655,000 discrete pluggable transceivers in a 128,000-GPU data center, reducing potential failure points. The Q3450-LD integrates optical conversion directly next to the switch ASIC, shortening electrical paths from centimeters to micrometers and dropping signal loss from 20dB to 4dB, removing the need for power-intensive DSPs. Early access to engineering samples allows for critical pre-production planning for rack design, cooling, and fiber management.
Key takeaway
For AI Architects and MLOps Engineers designing large-scale GPU clusters, adopting co-packaged optics (CPO) like the NVIDIA Q3450-LD is crucial. You can significantly increase GPU density within existing power envelopes and enhance network reliability for agentic workloads. Prioritize early infrastructure planning for cooling, power, and fiber management to integrate CPO effectively and maximize your cluster's token throughput and operational uptime.
Key insights
Co-packaged optics (CPO) significantly improves power efficiency and reliability in large-scale GPU cluster networking.
Principles
- Network power impacts GPU headroom.
- Agentic workloads demand resilient networks.
- Early hardware access streamlines deployment.
Method
Installing CPO switches requires upfront planning for rack fit, busbar alignment, liquid-cooling connections, pressure checks, and fiber routing, integrating vendor and deployment teams.
In practice
- Deploy CPO to increase GPU density.
- Reduce network failure points in clusters.
- Plan infrastructure with CPO early.
Topics
- Co-packaged Optics
- InfiniBand Networking
- GPU Clusters
- AI Infrastructure
- Power Efficiency
- Agentic Workloads
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, MLOps Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Lambda Deep Learning Blog.