Are LLM agents good at join order optimization?

2026-04-22 · Source: Databricks · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Advanced, medium

Summary

Databricks, in collaboration with UPenn, has developed a prototype LLM-powered agent to address the long-standing database challenge of join ordering in SQL queries. This agent operates offline, iteratively testing and refining join orders to improve query performance, unlike traditional optimizers that must make instantaneous decisions. Evaluated on the Join Order Benchmark (JOB) with a scaled-up IMDb dataset, the agent achieved a 1.288x geomean improvement in query latency and a 41% reduction in P90 latency compared to the standard Databricks optimizer. The agent's success is particularly notable in handling complex predicates like `LIKE`, which are difficult for traditional cardinality estimators, demonstrating its potential to autonomously repair and enhance database queries.

Key takeaway

For research scientists focused on database engine performance, you should investigate integrating offline LLM agents into your query optimization workflows. This approach can significantly reduce query latency, especially for complex queries with difficult-to-estimate cardinalities, by iteratively refining join orders. Consider how to define effective tools for the agent and when to trigger such optimizations to maximize performance gains.

Key insights

LLM agents can autonomously optimize database join orders offline, significantly improving query performance.

Principles

Offline LLM agents can mimic human expert tuning.
Iterative refinement improves query performance over time.

Method

A prototype query optimization agent uses a single tool to execute candidate join orders, returning runtime and subplan sizes. It performs 50 iterations, balancing exploitation and exploration, then selects the best-performing valid join order.

In practice

Automate join order tuning for problematic queries.
Identify systematic errors in default optimizers.
Explore agent-based query optimization for complex predicates.

Topics

LLM Agents
Join Order Optimization
Query Optimizers
Cardinality Estimation
Database Performance

Code references

RyanMarcus/imdb_pg_dataset

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.