Production-Grade Text-to-SQL Agent with Claude Code, LangGraph, Langfuse, FastAPI and Qdrant
Summary
This article details the construction of a production-grade Text-to-SQL agent, leveraging Claude Sonnet 4.6 for SQL generation and self-correction, and Claude Haiku 4.5 for hallucination evaluation. The system integrates a 7-node LangGraph 0.2 state machine, a RAG pipeline with Voyage AI voyage-3 embeddings and Qdrant for schema and knowledge base retrieval, and a FastAPI backend with a Streamlit UI. It features robust SQL validation using sqlglot, read-only PostgreSQL execution, and a self-correction loop with up to 3 retries. Observability is provided by Langfuse SDK v4, tracing every agent run and node-level latency. The agent is tested against a realistic UDogRetail dataset and evaluated using GEval for SQL correctness, schema faithfulness, and explanation hallucination. All components are containerized with Docker and Docker Compose, ensuring a reproducible and scalable deployment.
Key takeaway
For AI Engineers building production-grade Text-to-SQL agents, prioritize a robust RAG pipeline and explicit state management. You should implement pre-execution SQL validation and a self-correction loop to enhance reliability and security. Integrate comprehensive observability like Langfuse to trace agent behavior and token usage. Rigorous evaluation with LLM-as-judge metrics is crucial for preventing hallucinations and ensuring schema faithfulness. This guides your prompt tuning and architectural decisions effectively.
Key insights
Building robust Text-to-SQL agents requires RAG, explicit state machines, and rigorous evaluation to prevent hallucinations and ensure reliability.
Principles
- Ground SQL generation in retrieved schema context.
- Explicitly label critical constraints in prompts.
- Enforce read-only access for SQL execution.
Method
The agent uses a 7-node LangGraph state machine: RETRIEVE (Qdrant), GENERATE (Claude), VALIDATE (sqlglot), EXECUTE (PostgreSQL), EXPLAIN, CORRECT (up to 3 retries), CLARIFY.
In practice
- Use "include_raw=True" for LLM calls to log token counts.
- Implement sqlglot for pre-execution SQL validation.
- Sanitize Decimal objects to float for JSON serialization.
Topics
- Text-to-SQL
- LLM Agents
- LangGraph
- Retrieval-Augmented Generation
- Langfuse
- Qdrant
Code references
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Naturallanguageprocessing on Medium.