Predictive Optimization at Scale: A Year of Innovation and What’s Next
Summary
Databricks' Predictive Optimization (PO) in Unity Catalog, now a default feature for new managed tables, automates data maintenance to enhance lakehouse performance and reduce storage costs. Throughout 2025, PO saw significant adoption, vacuuming exabytes of unreferenced data, compacting hundreds of petabytes, and enabling Automatic Liquid Clustering for millions of tables. Key advancements include Automatic Statistics, which delivered up to 22% faster queries by maintaining accurate statistics without manual intervention, and a 6x faster, 4x cheaper VACUUM execution path. PO also expanded platform-wide coverage to include Lakeflow Spark Declarative Pipelines, supporting both Delta and Iceberg tables. Future plans for 2026 include Auto-TTL for automated row deletion and enhanced observability via the Data Governance Hub, providing detailed ROI metrics for PO operations.
Key takeaway
For CTOs and VPs of Engineering managing large data estates, Databricks' Predictive Optimization offers a critical shift from manual table tuning to autonomous data management. Your teams can significantly reduce operational burden and storage costs while improving query performance by adopting Unity Catalog managed tables with PO enabled. Explore the Private Previews for Auto-TTL and the Data Governance Hub to gain deeper insights into PO's impact and further automate data lifecycle management.
Key insights
Predictive Optimization automates data layout and maintenance for cost-effective, high-performance lakehouses.
Principles
- Automate data maintenance based on usage patterns.
- Optimize query performance through accurate statistics.
- Reduce storage costs by eliminating unreferenced data.
Method
PO continuously analyzes data write/query patterns, then automatically runs OPTIMIZE, VACUUM, CLUSTER BY, and ANALYZE commands, adapting to evolving workloads without manual tuning.
In practice
- Use Automatic Liquid Clustering for autonomous data layout.
- Leverage log-based VACUUM for faster, cheaper data deletion.
- Implement Auto-TTL for automated row deletion policies.
Topics
- Predictive Optimization
- Unity Catalog
- Automatic Liquid Clustering
- Data Lakehouse Optimization
- Data Lifecycle Management
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Data Engineer, MLOps Engineer, AI Operations Specialist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.