To Describe or Not to Describe? Benchmarking Database Representations for Schema Linking in Text-to-SQL
Summary
A study presented at PROPOR 2026 by Daiane Ucceli Kreitlow and Hilário Tomaz Alves de Oliveira investigates Schema Linking for Text-to-SQL systems, specifically for questions in Brazilian Portuguese. The research compares two schema representation strategies: natural-language descriptions generated by Large Language Models (LLMs) and representations derived from Data Definition Language (DDL) and Data Manipulation Language (DML) commands. Experiments were conducted on a Brazilian Portuguese adaptation of the Spider dataset, which includes over 200 databases. The evaluation involved several LLMs and embedding models, with results based on Hit@k metrics. The findings indicate that natural language descriptions consistently outperform DDL/DML-based representations, highlighting the superior effectiveness of LLM-generated schema descriptions for Schema Linking tasks in Text-to-SQL contexts.
Key takeaway
For AI Engineers developing Text-to-SQL systems, especially for non-English languages like Brazilian Portuguese, you should prioritize using Large Language Models to generate natural language descriptions of database schemas. This approach has been shown to consistently outperform DDL/DML-based representations in schema linking tasks, leading to more accurate identification of relevant databases, tables, and columns.
Key insights
LLM-generated natural language descriptions enhance Text-to-SQL schema linking more effectively than DDL/DML.
Principles
- Natural language descriptions improve schema linking.
- LLMs can generate effective schema representations.
Method
The study compared LLM-generated natural language descriptions against DDL/DML representations for schema linking on a Brazilian Portuguese Spider dataset using Hit@k metrics.
In practice
- Use LLMs for database schema descriptions.
- Prioritize natural language for schema linking.
Topics
- Text-to-SQL
- Schema Linking
- Large Language Models
- Database Representations
- Brazilian Portuguese
Best for: AI Engineer, Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.