CloudCons: A Comprehensive End-to-End Benchmark for Cloud Resource Consolidation

2026-02-03 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Data Science & Analytics · Depth: Advanced, extended

Summary

CloudCons is a new end-to-end benchmark designed to evaluate time series forecasting models within the context of cloud resource consolidation, addressing the persistent low CPU utilization (15%-20%) in data centers. It integrates high-quality datasets from Huawei Cloud, Microsoft Azure, and Google Borg, capturing diverse workload patterns from diurnal rhythms to stochastic bursts. The benchmark evaluates statistical, deep learning, and foundation models across five dimensions: prediction error, resource efficiency, load balance, service reliability, and uncertainty quantification. Key findings reveal that superior zero-shot forecasting accuracy in foundation models like Chronos 2 and TimesFM 2.5 does not inherently translate into better decision utility for consolidation. The study also highlights that predictive quantile selection is a critical lever, with α ≥ 0.8 recommended for high reliability (VR < 0.01) and α ∈ [0.4, 0.6] for cost-sensitive applications.

Key takeaway

For MLOps Engineers optimizing cloud resource allocation, recognize that high forecasting accuracy alone does not ensure optimal consolidation. You should prioritize decision-oriented metrics over traditional prediction errors. Calibrate your forecasting models by selecting predictive quantiles: use α ≥ 0.8 for mission-critical services requiring high reliability, and α ∈ [0.4, 0.6] for cost-sensitive applications to balance efficiency and service reliability. This approach helps mitigate risks of over-provisioning or service violations.

Key insights

Forecasting accuracy in cloud resource consolidation does not directly equate to decision utility.

Principles

Decision utility metrics are crucial for evaluating forecasting models.
Predictive quantile selection balances efficiency and reliability.
Foundation models excel in zero-shot generalization for complex workloads.

Method

CloudCons constructs multi-cloud datasets, simulates a forecast-then-optimize workflow, and evaluates models across five dimensions: prediction error, resource efficiency, load balance, service reliability, and uncertainty quantification.

In practice

Use α ≥ 0.8 for mission-critical services to ensure high reliability.
For cost-sensitive applications, target α ∈ [0.4, 0.6] for efficiency.
Consider traditional models for highly regularized, predictable workloads.

Topics

Cloud Resource Consolidation
Time Series Foundation Models
Forecasting Benchmarks
AIOps
Predictive Quantiles
Service Reliability

Code references

dmwyd/CloudCons

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.