I Turned an Archived 23K-Star Text-to-SQL Project Into a Self-Hosted Tool That Actually Works Out…
Summary
DataChat is a new self-hosted text-to-SQL chat interface, forked from the archived Vanna.ai project, designed to provide an out-of-the-box solution for generating SQL queries from natural language. The original Vanna.ai, which garnered over 23,000 GitHub stars, was archived in March 2026, leaving a gap for self-hosting enthusiasts. DataChat addresses several critical issues found in the original codebase, including the lack of a schema explorer, manual schema refresh requirements, hardcoded database switching, complex frontend build processes, and serialization crashes with complex data types. It integrates a full schema sidebar, automatic schema refreshing, command-line database switching, and an automated frontend build. DataChat supports both cloud LLMs like Gemini (gemini-2.5-flash) and local LLMs via Ollama (mistral-small3.1:latest), allowing users to keep data entirely local. It is MIT licensed and currently supports PostgreSQL and BigQuery.
Key takeaway
For AI Engineers or MLOps teams seeking a self-hosted text-to-SQL solution, DataChat offers a significantly improved experience over the archived Vanna.ai. You should consider deploying DataChat to streamline data access for non-SQL users, benefiting from its automated setup, integrated schema explorer, and support for local LLMs, which enhances data privacy. This tool eliminates common friction points, making it easier to integrate natural language querying into your data workflows.
Key insights
DataChat transforms an archived text-to-SQL project into a functional, self-hosted tool by resolving key usability and technical issues.
Principles
- Automate setup complexities.
- Provide clear database context.
- Ensure data type compatibility.
Method
The author forked the Vanna.ai 2.0.2 codebase, implemented a schema sidebar, automated schema refreshes and frontend builds, and patched serialization issues to create DataChat.
In practice
- Use `python start.py --database analytics` for database switching.
- Set `LLM_PROVIDER=ollama` for local LLM inference.
- Delete `dist/` to force a frontend rebuild.
Topics
- Text-to-SQL
- DataChat
- Vanna.ai
- Self-Hosted Tools
- Large Language Models
Code references
Best for: AI Engineer, MLOps Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.