Are LLM agents good at join order optimization?

· Source: Databricks · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Advanced, medium

Summary

Databricks, in collaboration with UPenn, has developed a prototype LLM-powered agent to address the long-standing database challenge of join ordering in SQL queries. This agent operates offline, iteratively testing and refining join orders to improve query performance, unlike traditional optimizers that must make instantaneous decisions. Evaluated on the Join Order Benchmark (JOB) with a scaled-up IMDb dataset, the agent achieved a 1.288x geomean improvement in query latency and a 41% reduction in P90 latency compared to the standard Databricks optimizer. The agent's success is particularly notable in handling complex predicates like `LIKE`, which are difficult for traditional cardinality estimators, demonstrating its potential to autonomously repair and enhance database queries.

Key takeaway

For research scientists focused on database engine performance, you should investigate integrating offline LLM agents into your query optimization workflows. This approach can significantly reduce query latency, especially for complex queries with difficult-to-estimate cardinalities, by iteratively refining join orders. Consider how to define effective tools for the agent and when to trigger such optimizations to maximize performance gains.

Key insights

LLM agents can autonomously optimize database join orders offline, significantly improving query performance.

Principles

Method

A prototype query optimization agent uses a single tool to execute candidate join orders, returning runtime and subplan sizes. It performs 50 iterations, balancing exploitation and exploration, then selects the best-performing valid join order.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.