DuckLake 1.0: Data Lake Format with SQL Catalog Metadata
Summary
DuckDB Labs released DuckLake 1.0 on May 02, 2026, a new data lake format that stores table metadata directly in a SQL database, departing from traditional file-based metadata approaches used by formats like Apache Iceberg, Delta Lake, and Apache Hudi. This initial implementation, available as a DuckDB extension, offers catalog-stored small updates, enhanced sorting and partitioning, and compatibility with Iceberg-style data features. Key improvements include data inlining for small inserts, updates, and deletes (defaulting to 10 rows), sorted tables for faster filtered queries, bucket partitioning for high-cardinality columns, and improved geometry data type support. DuckLake 1.0 is production-ready with backward compatibility and provides clients for Apache DataFusion, Apache Spark, Trino, and Pandas, with MotherDuck offering a hosted service.
Key takeaway
For data platform engineers evaluating data lake formats, DuckLake 1.0 offers a compelling alternative by centralizing metadata in a SQL database, potentially resolving "small file problems" and accelerating metadata operations. You should investigate its performance for "real" workloads, especially if your current setup struggles with frequent small updates or complex metadata coordination, and consider its compatibility with existing Iceberg features.
Key insights
DuckLake 1.0 centralizes data lake metadata in a SQL database, improving performance and simplifying operations.
Principles
- Database-centric metadata avoids small file proliferation.
- Inlining small operations enhances update efficiency.
Method
DuckLake stores table metadata in a SQL database, enabling features like data inlining for small inserts/updates/deletes, sorted tables, and bucket partitioning, with an initial implementation as a DuckDB extension.
In practice
- Use DuckLake for faster metadata operations.
- Implement data inlining for frequent small updates.
- Explore DuckLake clients for Spark, Trino, or Pandas.
Topics
- DuckLake 1.0
- Data Lake Format
- SQL Catalog Metadata
- Data Inlining
- DuckDB Extension
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Data Engineer, MLOps Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.