Startup Boosts Scale-Up to 1000+ GPUs in a Single Domain
Summary
Delos Data, a startup, is developing a cluster management software stack and a new server design to enable GPU scale-up domains exceeding 1000 GPUs for AI inference workloads. Their Nonstop AI platform offers flexible topology options, a disaggregated server design, and 72x 200 Gb/s ports per server via OSFPs. This approach aims to reduce cost and power per token by improving GPU utilization, addressing the nanosecond latency sensitivity of distributed inference. The system supports huge scale-up domains, potentially 10,000 GPUs, and features a Mosaic software stack for graceful failure handling and re-routing data. Broader availability is planned for the fourth quarter of 2026.
Key takeaway
For MLOps Engineers optimizing large-scale AI inference, Delos Data's disaggregated architecture offers a path to significantly larger GPU scale-up domains than current NVLink limits. You should evaluate how flexible topologies and robust software-managed resilience could reduce your operational costs and improve GPU utilization for latency-sensitive workloads. Consider its Q4 2026 availability for future infrastructure planning.
Key insights
Disaggregated server design and software enable resilient, large-scale GPU inference clusters with flexible topologies.
Principles
- Distributed inference demands nanosecond latency and always-on reliability.
- Modular architectures enhance flexibility and physical disaggregation.
- Software is crucial for managing resilience in large-scale networks.
In practice
- Design for 1000+ GPU scale-up domains.
- Utilize OSFP cables for flexible interconnects.
- Implement software for graceful failure handling.
Topics
- GPU Scale-Up
- AI Inference
- Cluster Management
- Disaggregated Systems
- Network Topology
- Data Center Interconnects
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, MLOps Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Big Data & AI News - EE Times.