AgentNLQ: A General-Purpose Agent for Natural Language to SQL

2026-05-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

AgentNLQ, a new multi-agent method for Natural Language to SQL (NL2SQL) conversion developed by JPMorganChase, achieves 78.1% semantic accuracy on the BIRD benchmark. This solution addresses the challenge of generating accurate SQL queries from natural language questions, a task where LLMs often fall short compared to human experts. AgentNLQ integrates an optimized multi-agent orchestrator that plans, reflects, and self-corrects, an advanced schema enrichment method for context-aware metadata, and a multi-model agent configuration utilizing OpenAI GPT-4o and Anthropic Claude Opus 4.1. The system improved from a 60.2% baseline by incorporating orchestration and vector search (68-72%), then multi-model agents (76.4%), and finally its custom orchestrator with structured context compression (78.1%). It also reduced token usage by approximately 19.8% on the BIRD-bench financial dataset. Functional guardrails prevent DML operations, ensuring database integrity.

Key takeaway

For AI Engineers building enterprise NL2SQL solutions, you should prioritize multi-agent architectures that incorporate execution-grounded feedback and automated schema enrichment. Implementing a dual-ledger orchestrator, like AgentNLQ's, can significantly boost semantic accuracy to 78.1% while managing latency and token costs. Ensure your system includes functional guardrails to prevent unintended database modifications. Consider using multi-model agents for specialized tasks to maximize performance.

Key insights

Multi-agent orchestration with enriched schema and execution feedback significantly improves NL2SQL accuracy and efficiency.

Principles

Combine fast (System 1) and slow (System 2) reasoning loops.
Ground LLM feedback in direct SQL execution results.
Automate schema enrichment for context-aware metadata.

Method

AgentNLQ uses an offline pipeline for metadata generation and an inference stage with a multi-agent orchestrator, SQL generator, and executor, employing a reason-generate-evaluate-replan loop with dual ledgers.

In practice

Implement functional guardrails to prevent DML operations in NL2SQL systems.
Use vector search for schema elements when context length is exceeded.
Employ structured context compression to manage long conversation histories.

Topics

NL2SQL
Multi-Agent Systems
LLM Orchestration
Schema Enrichment
BIRD Benchmark
SQL Generation
Context Management

Code references

gkamradt/LLMTest_NeedleInAHaystack

Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.