Predictive Optimization at Scale: A Year of Innovation and What’s Next

2026-02-18 · Source: Databricks · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cloud Computing & IT Infrastructure · Depth: Intermediate, medium

Summary

Databricks' Predictive Optimization (PO) in Unity Catalog, now a default feature for new managed tables, automates data maintenance to enhance lakehouse performance and reduce storage costs. Throughout 2025, PO saw significant adoption, vacuuming exabytes of unreferenced data, compacting hundreds of petabytes, and enabling Automatic Liquid Clustering for millions of tables. Key advancements include Automatic Statistics, which delivered up to 22% faster queries by maintaining accurate statistics without manual intervention, and a 6x faster, 4x cheaper VACUUM execution path. PO also expanded platform-wide coverage to include Lakeflow Spark Declarative Pipelines, supporting both Delta and Iceberg tables. Future plans for 2026 include Auto-TTL for automated row deletion and enhanced observability via the Data Governance Hub, providing detailed ROI metrics for PO operations.

Key takeaway

For CTOs and VPs of Engineering managing large data estates, Databricks' Predictive Optimization offers a critical shift from manual table tuning to autonomous data management. Your teams can significantly reduce operational burden and storage costs while improving query performance by adopting Unity Catalog managed tables with PO enabled. Explore the Private Previews for Auto-TTL and the Data Governance Hub to gain deeper insights into PO's impact and further automate data lifecycle management.

Key insights

Predictive Optimization automates data layout and maintenance for cost-effective, high-performance lakehouses.

Principles

Automate data maintenance based on usage patterns.
Optimize query performance through accurate statistics.
Reduce storage costs by eliminating unreferenced data.

Method

PO continuously analyzes data write/query patterns, then automatically runs OPTIMIZE, VACUUM, CLUSTER BY, and ANALYZE commands, adapting to evolving workloads without manual tuning.

In practice

Use Automatic Liquid Clustering for autonomous data layout.
Leverage log-based VACUUM for faster, cheaper data deletion.
Implement Auto-TTL for automated row deletion policies.

Topics

Predictive Optimization
Unity Catalog
Automatic Liquid Clustering
Data Lakehouse Optimization
Data Lifecycle Management

Best for: CTO, VP of Engineering/Data, Director of AI/ML, Data Engineer, MLOps Engineer, AI Operations Specialist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.