Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards

2026-06-08 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Progress-SQL is a multi-turn reinforcement learning framework designed to enhance Text-to-SQL generation by large language models. It introduces an Oracle-guided Diagnostic Tree (ODT) that provides clause-level structural feedback for iterative SQL refinement. The framework utilizes a progressive reward system, combining ODT-based structural alignment with lexical alignment, a progression latency reward for earlier correctness, and an execution status reward for invalid SQL recovery. Experiments on BIRD, Spider, and Spider robustness variants demonstrate Progress-SQL improves 7B backbone models by an average of 8.5% in execution accuracy and 6.3% in test-suite accuracy on Spider Dev, consistently outperforming existing RL methods.

Key takeaway

For machine learning engineers developing Text-to-SQL solutions, adopting a multi-turn reinforcement learning approach with progressive rewards can significantly enhance model accuracy and robustness. You should consider integrating Oracle-guided Diagnostic Trees (ODT) to provide fine-grained, clause-level feedback during training, fostering iterative SQL refinement. This strategy improves first-attempt SQL generation and recovery from invalid queries, leading to more reliable systems.

Key insights

Multi-turn reinforcement learning with progressive rewards and ODT feedback significantly improves Text-to-SQL generation.

Principles

Multi-turn RL with diagnostic feedback enhances SQL refinement.
Progressive rewards capture trajectory-level improvement.
Early correctness and execution recovery are crucial for robust RL.

Method

Progress-SQL uses multi-turn rollouts, generating SQLs iteratively. ODT provides clause-level structural feedback. Rewards combine progressive alignment, latency, execution status, and format.

In practice

Implement ODT for fine-grained SQL error diagnosis.
Design rewards for trajectory improvement, not just final state.
Incorporate latency and executability for robust RL training.

Topics

Text-to-SQL
Reinforcement Learning
Large Language Models
SQL Generation
Oracle-guided Diagnostic Tree
Progressive Rewards

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.