Are LLM agents good at join order optimization?
Summary
Databricks, in collaboration with UPenn, has developed a prototype LLM-powered agent to address the long-standing database challenge of join ordering in SQL queries. This agent operates offline, iteratively testing and refining join orders to improve query performance, unlike traditional optimizers that must make instantaneous decisions. Evaluated on the Join Order Benchmark (JOB) with a scaled-up IMDb dataset, the agent achieved a 1.288x geomean improvement in query latency and a 41% reduction in P90 latency compared to the standard Databricks optimizer. The agent's success is particularly notable in handling complex predicates like `LIKE`, which are difficult for traditional cardinality estimators, demonstrating its potential to autonomously repair and enhance database queries.
Key takeaway
For research scientists focused on database engine performance, you should investigate integrating offline LLM agents into your query optimization workflows. This approach can significantly reduce query latency, especially for complex queries with difficult-to-estimate cardinalities, by iteratively refining join orders. Consider how to define effective tools for the agent and when to trigger such optimizations to maximize performance gains.
Key insights
LLM agents can autonomously optimize database join orders offline, significantly improving query performance.
Principles
- Offline LLM agents can mimic human expert tuning.
- Iterative refinement improves query performance over time.
Method
A prototype query optimization agent uses a single tool to execute candidate join orders, returning runtime and subplan sizes. It performs 50 iterations, balancing exploitation and exploration, then selects the best-performing valid join order.
In practice
- Automate join order tuning for problematic queries.
- Identify systematic errors in default optimizers.
- Explore agent-based query optimization for complex predicates.
Topics
- LLM Agents
- Join Order Optimization
- Query Optimizers
- Cardinality Estimation
- Database Performance
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.