From monolith to Lakebase to LTAP: rethinking the database from storage up
Summary
Databricks' Lakebase and LTAP architectures represent a significant rethinking of traditional monolithic databases, detailing how Lakebase externalizes the Write-Ahead Log (WAL) and data files into independent SafeKeeper and PageServer services. This design makes Postgres compute instances stateless, enabling unlimited storage, serverless elasticity, durable writes, simpler high availability, and instant branching. Building on this, LTAP (Lake Transactional/Analytical Processing) unifies transactional and analytical processing on a single copy of data in open columnar formats like Delta and Iceberg within the data lake. This eliminates the need for Change Data Capture (CDC) or data mirroring, allowing analytical engines to read fresh data without impacting transactional workloads and addressing the limitations of traditional HTAP systems by separating compute engines while unifying storage.
Key takeaway
For AI Architects and Data Engineers designing modern data architectures, Lakebase and LTAP fundamentally rethink database design. You can achieve unlimited storage, elastic compute, and zero data loss by decoupling compute from storage. This architecture enables real-time analytics directly on transactional data in open formats, eliminating complex CDC pipelines and data synchronization issues. Consider this approach to simplify your data stack and ensure consistent, fresh data for both operational and analytical workloads.
Key insights
Lakebase and LTAP decouple database compute from storage, enabling unified transactional and analytical processing on a single, open data copy.
Principles
- Decouple compute and storage for scalability.
- Externalize WAL for durability and availability.
- Unify data storage for transactions and analytics.
Method
Lakebase externalizes Postgres's WAL to SafeKeeper for Paxos-based replication and data files to PageServer, which materializes changes into cloud object storage in open columnar formats for LTAP.
In practice
- Branch large databases in seconds for experiments.
- Run analytics on fresh data without OLTP impact.
- Eliminate CDC pipelines for data synchronization.
Topics
- Lakebase Architecture
- LTAP
- Cloud Object Storage
- Postgres
- Data Lakehouse
- Database Scalability
Best for: MLOps Engineer, CTO, VP of Engineering/Data, Data Engineer, Software Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.