Understanding, Detecting, and Repairing Real-World In-Context-Learning-Based Text-to-SQL Errors

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Expert, extended

Summary

A comprehensive study reveals that Large Language Models (LLMs) performing text-to-SQL tasks using in-context learning (ICL) frequently generate erroneous SQL queries, with 37.3% of queries containing errors across four ICL-based techniques, two benchmarks (Spider and Bird), and two LLM settings (GPT-3.5-Turbo-0125 and GPT-4o-2024-05-13). Researchers categorized 29 error types into 7 categories, finding 26.0% are format-related and 30.9% are semantic. Existing repairing methods offer limited correctness improvement (10.9-23.3% fixed) at high computational cost (1.03-3.82x latency) and introduce 5.3-40.1% new errors. To address this, MapleRepair, a novel detection and repairing framework, was developed. It repairs 13.8% more queries with 84.9% fewer mis-repairs and 67.4% less overhead, processing queries in 1.2 seconds.

Key takeaway

For AI Scientists and Machine Learning Engineers developing text-to-SQL solutions, recognize that LLM-generated SQL is prone to specific, classifiable errors. You should prioritize integrating robust, rule-based error detection and repair mechanisms like MapleRepair. This approach significantly reduces mis-repairs and computational overhead compared to relying solely on LLM self-correction, improving the reliability and efficiency of your text-to-SQL systems.

Key insights

LLM-generated SQL queries have widespread, categorized errors, requiring targeted, efficient repair solutions.

Principles

Method

MapleRepair uses a multi-stage, rule-based detection and repair system, selectively invoking LLMs for complex errors. It prioritizes fixing syntax, schema, logic, and convention errors before addressing semantic issues.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.