AgentNLQ: A General-Purpose Agent for Natural Language to SQL

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

AgentNLQ, a new multi-agent method for Natural Language to SQL (NL2SQL) conversion developed by JPMorganChase, achieves 78.1% semantic accuracy on the BIRD benchmark. This solution addresses the challenge of generating accurate SQL queries from natural language questions, a task where LLMs often fall short compared to human experts. AgentNLQ integrates an optimized multi-agent orchestrator that plans, reflects, and self-corrects, an advanced schema enrichment method for context-aware metadata, and a multi-model agent configuration utilizing OpenAI GPT-4o and Anthropic Claude Opus 4.1. The system improved from a 60.2% baseline by incorporating orchestration and vector search (68-72%), then multi-model agents (76.4%), and finally its custom orchestrator with structured context compression (78.1%). It also reduced token usage by approximately 19.8% on the BIRD-bench financial dataset. Functional guardrails prevent DML operations, ensuring database integrity.

Key takeaway

For AI Engineers building enterprise NL2SQL solutions, you should prioritize multi-agent architectures that incorporate execution-grounded feedback and automated schema enrichment. Implementing a dual-ledger orchestrator, like AgentNLQ's, can significantly boost semantic accuracy to 78.1% while managing latency and token costs. Ensure your system includes functional guardrails to prevent unintended database modifications. Consider using multi-model agents for specialized tasks to maximize performance.

Key insights

Multi-agent orchestration with enriched schema and execution feedback significantly improves NL2SQL accuracy and efficiency.

Principles

Method

AgentNLQ uses an offline pipeline for metadata generation and an inference stage with a multi-agent orchestrator, SQL generator, and executor, employing a reason-generate-evaluate-replan loop with dual ledgers.

In practice

Topics

Code references

Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.