Understanding, Detecting, and Repairing Real-World In-Context-Learning-Based Text-to-SQL Errors

2024-05-13 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Expert, extended

Summary

A comprehensive study reveals that Large Language Models (LLMs) performing text-to-SQL tasks using in-context learning (ICL) frequently generate erroneous SQL queries, with 37.3% of queries containing errors across four ICL-based techniques, two benchmarks (Spider and Bird), and two LLM settings (GPT-3.5-Turbo-0125 and GPT-4o-2024-05-13). Researchers categorized 29 error types into 7 categories, finding 26.0% are format-related and 30.9% are semantic. Existing repairing methods offer limited correctness improvement (10.9-23.3% fixed) at high computational cost (1.03-3.82x latency) and introduce 5.3-40.1% new errors. To address this, MapleRepair, a novel detection and repairing framework, was developed. It repairs 13.8% more queries with 84.9% fewer mis-repairs and 67.4% less overhead, processing queries in 1.2 seconds.

Key takeaway

For AI Scientists and Machine Learning Engineers developing text-to-SQL solutions, recognize that LLM-generated SQL is prone to specific, classifiable errors. You should prioritize integrating robust, rule-based error detection and repair mechanisms like MapleRepair. This approach significantly reduces mis-repairs and computational overhead compared to relying solely on LLM self-correction, improving the reliability and efficiency of your text-to-SQL systems.

Key insights

LLM-generated SQL queries have widespread, categorized errors, requiring targeted, efficient repair solutions.

Principles

LLMs struggle with SQL syntax and schema comprehension.
Execution feedback significantly aids SQL error repair.
Untargeted LLM-based repair can worsen SQL errors.

Method

MapleRepair uses a multi-stage, rule-based detection and repair system, selectively invoking LLMs for complex errors. It prioritizes fixing syntax, schema, logic, and convention errors before addressing semantic issues.

In practice

Implement rule-based checks for common SQL syntax errors.
Provide execution results and value specifications to LLMs.
Avoid blanket LLM re-generation for all SQL queries.

Topics

Text-to-SQL
Large Language Models
In-Context Learning
SQL Error Detection
SQL Error Repair
MapleRepair
Database Query Generation

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.