Scheduling in a changing world: Maximizing throughput with time-varying capacity

· Source: The latest research from Google · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Expert, medium

Summary

Google Research introduced new, provably effective algorithms for non-preemptive job scheduling on cloud infrastructure with time-varying machine availability, a common scenario in large-scale cloud computing. The research, presented at SPAA 2025, addresses the problem of maximizing throughput (total weight or number of successful jobs) when resources fluctuate due to hardware failure, maintenance, or high-priority tasks claiming capacity. For the offline setting, where future capacity is known, a Greedy strategy achieved a 1/2-approximation for unit profit jobs and a 1/4-approximation using a primal-dual framework for varying profits. In the more complex online setting, where jobs arrive dynamically, standard non-preemptive algorithms fail. The team developed a variant of Greedy for "interruption with restarts" that also achieves a 1/2-competitive ratio. For the stricter "interruption without restarts" model, a novel algorithm achieved a 1/11 competitive ratio for common deadline instances, representing the first constant competitive ratio for this problem.

Key takeaway

For AI Scientists designing or optimizing cloud resource schedulers, this research highlights the necessity of moving beyond static capacity assumptions. You should evaluate your current scheduling algorithms against the challenges of time-varying capacity, particularly for non-preemptive batch jobs. Consider implementing algorithms that allow for job restarts to significantly improve throughput in dynamic online environments, or explore the proposed constant competitive ratio algorithms for stricter "interruption without restarts" scenarios, especially when jobs share common deadlines.

Key insights

New algorithms optimize non-preemptive job scheduling in dynamic cloud environments with fluctuating machine capacity.

Principles

Method

The online algorithm for "interruption without restarts" maintains a tentative schedule, modifying it by adding, replacing, interrupting, or discarding jobs based on arrival and current state, particularly for common deadline scenarios.

In practice

Topics

Best for: AI Scientist, AI Researcher, Research Scientist, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The latest research from Google.