Startup Boosts Scale-Up to 1000+ GPUs in a Single Domain

· Source: Big Data & AI News - EE Times · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, short

Summary

Delos Data, a startup, is developing a cluster management software stack and a new server design to enable GPU scale-up domains exceeding 1000 GPUs for AI inference workloads. Their Nonstop AI platform offers flexible topology options, a disaggregated server design, and 72x 200 Gb/s ports per server via OSFPs. This approach aims to reduce cost and power per token by improving GPU utilization, addressing the nanosecond latency sensitivity of distributed inference. The system supports huge scale-up domains, potentially 10,000 GPUs, and features a Mosaic software stack for graceful failure handling and re-routing data. Broader availability is planned for the fourth quarter of 2026.

Key takeaway

For MLOps Engineers optimizing large-scale AI inference, Delos Data's disaggregated architecture offers a path to significantly larger GPU scale-up domains than current NVLink limits. You should evaluate how flexible topologies and robust software-managed resilience could reduce your operational costs and improve GPU utilization for latency-sensitive workloads. Consider its Q4 2026 availability for future infrastructure planning.

Key insights

Disaggregated server design and software enable resilient, large-scale GPU inference clusters with flexible topologies.

Principles

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, MLOps Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Big Data & AI News - EE Times.