Zero-Downtime Patching in Lakebase Part 1: Prewarming

· Source: Databricks · Field: Technology & Digital — Cloud Computing & IT Infrastructure, Software Development & Engineering · Depth: Advanced, short

Summary

Lakebase is addressing planned database maintenance, which causes more workload disruption than unplanned failures, by aiming for "zero-downtime patching." This initial post details "prewarming," a technique designed to prevent performance degradation after database restarts, which typically result in a ~70% reduction in pgbench TPS due to lost in-memory caches. Leveraging its architecture of stateless compute nodes and disaggregated storage, Lakebase implements automatic prewarming by spinning up a new compute, transferring cache pages from the current primary, and subscribing to the Write-Ahead Log (WAL) to keep the cache updated before promotion. Experimental results using 10 GB pgbench demonstrate that prewarming enables nearly instant throughput recovery for both read-only and read-write workloads, significantly improving performance compared to restarts without it. This feature is now standard for planned restarts of read/write endpoints at no extra cost.

Key takeaway

Lakebase introduces "prewarming," a novel technique leveraging its disaggregated architecture to eliminate performance degradation during planned database maintenance. This method spins up a new compute node, pre-loads its cache from the active primary, and keeps it updated via WAL, enabling near-instant throughput recovery after a rapid, non-restarting failover. It prevents the typical ~70% pgbench TPS drop seen with cold restarts, ensuring unnoticeable version updates and security patches for users at no additional cost.

Topics

Best for: DevOps Engineer, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.