AgentNLQ: A General-Purpose Agent for Natural Language to SQL
Summary
AgentNLQ, a new multi-agent method for Natural Language to SQL (NL2SQL) conversion developed by JPMorganChase, achieves 78.1% semantic accuracy on the BIRD benchmark. This solution addresses the challenge of generating accurate SQL queries from natural language questions, a task where LLMs often fall short compared to human experts. AgentNLQ integrates an optimized multi-agent orchestrator that plans, reflects, and self-corrects, an advanced schema enrichment method for context-aware metadata, and a multi-model agent configuration utilizing OpenAI GPT-4o and Anthropic Claude Opus 4.1. The system improved from a 60.2% baseline by incorporating orchestration and vector search (68-72%), then multi-model agents (76.4%), and finally its custom orchestrator with structured context compression (78.1%). It also reduced token usage by approximately 19.8% on the BIRD-bench financial dataset. Functional guardrails prevent DML operations, ensuring database integrity.
Key takeaway
For AI Engineers building enterprise NL2SQL solutions, you should prioritize multi-agent architectures that incorporate execution-grounded feedback and automated schema enrichment. Implementing a dual-ledger orchestrator, like AgentNLQ's, can significantly boost semantic accuracy to 78.1% while managing latency and token costs. Ensure your system includes functional guardrails to prevent unintended database modifications. Consider using multi-model agents for specialized tasks to maximize performance.
Key insights
Multi-agent orchestration with enriched schema and execution feedback significantly improves NL2SQL accuracy and efficiency.
Principles
- Combine fast (System 1) and slow (System 2) reasoning loops.
- Ground LLM feedback in direct SQL execution results.
- Automate schema enrichment for context-aware metadata.
Method
AgentNLQ uses an offline pipeline for metadata generation and an inference stage with a multi-agent orchestrator, SQL generator, and executor, employing a reason-generate-evaluate-replan loop with dual ledgers.
In practice
- Implement functional guardrails to prevent DML operations in NL2SQL systems.
- Use vector search for schema elements when context length is exceeded.
- Employ structured context compression to manage long conversation histories.
Topics
- NL2SQL
- Multi-Agent Systems
- LLM Orchestration
- Schema Enrichment
- BIRD Benchmark
- SQL Generation
- Context Management
Code references
Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.