The Power of Constraints in Natural Language to SQL Translation

Authors:

Tonghui Ren, Chen Ke, Yuankai Fan, Yinan Jing, Zhenying He, Kai Zhang, X. Sean Wang

Download PDF

Abstract

Current large language model (LLM)-based Natural Language to SQL (NL2SQL) approaches typically rely on the database schema and partial data values for the translation. These approaches are unable to use sufficient data for accurate database understanding due to limitations in data selection methods, and they cannot input the entire database due to the limited context window sizes of LLMs. This insufficient data integration may result in an incomplete understanding of the database, leading to semantically incorrect SQL generation. In this paper, we introduce REDSQL, a novel plugand-play framework that refines the predicted SQL by utilizing the entire database in the refinement process. The core idea of REDSQL is to enhance SQL refinement by identifying potential errors based on the database content, which is achieved by applying constraints on the input relations of query operations. LLMs can refine the SQL using SQL-related information extracted by REDSQL, which provides concise and informative insights into the database. Additionally, REDSQL enhances schema semantics by integrating data profiling for more effective database utilization. Our experiments demonstrate that REDSQL consistently improves the performance of existing NL2SQL approaches across five benchmarks. Specifically, REDSQL elevates the accuracy of CODES to 67.3% (+8.8%) and PURPLE to 67.7% (+11.1%) on the Bird benchmark.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 18, No. 7

The Power of Constraints in Natural Language to SQL Translation

Abstract