论文标题
Bertrand-DR:使用判别性重新级别改进文本到SQL
Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker
论文作者
论文摘要
要访问关系数据库中存储的数据,用户需要了解数据库架构并使用SQL等查询语言编写查询。为了简化此任务,文本到SQL模型尝试将用户的自然语言问题转换为相应的SQL查询。最近,已经开发了几种生成文本到SQL模型。我们提出了一个新颖的判别重新级别,以通过从文本到SQL发电机预测的光束输出中提取最佳的SQL查询来提高生成文本到SQL模型的性能,从而在候选列表中最佳查询的情况下,但不在列表中,从而提高了性能。我们将重新级别构建为架构不可知论的BERT微调分类器。我们分析了在不同查询硬度级别上的文本到SQL和重新级别模型的相对强度,并建议如何结合两个模型以获得最佳性能。我们通过将其应用于两个最先进的文本到SQL模型,并在撰写本文时在Spider排行榜上获得前4个得分,从而证明了重新级别的有效性。
To access data stored in relational databases, users need to understand the database schema and write a query using a query language such as SQL. To simplify this task, text-to-SQL models attempt to translate a user's natural language question to corresponding SQL query. Recently, several generative text-to-SQL models have been developed. We propose a novel discriminative re-ranker to improve the performance of generative text-to-SQL models by extracting the best SQL query from the beam output predicted by the text-to-SQL generator, resulting in improved performance in the cases where the best query was in the candidate list, but not at the top of the list. We build the re-ranker as a schema agnostic BERT fine-tuned classifier. We analyze relative strengths of the text-to-SQL and re-ranker models across different query hardness levels, and suggest how to combine the two models for optimal performance. We demonstrate the effectiveness of the re-ranker by applying it to two state-of-the-art text-to-SQL models, and achieve top 4 score on the Spider leaderboard at the time of writing this article.