ValueNet：一种自然语言到SQL系统，从数据库信息中学习

论文标题

ValueNet：一种自然语言到SQL系统，从数据库信息中学习

ValueNet: A Natural Language-to-SQL System that Learns from Database Information

论文作者

Brunner, Ursin, Stockinger, Kurt

论文摘要

数十年来，为数据库构建自然语言（NL）界面一直是一个长期的挑战。这些所谓的NL到SQL系统的主要优点是，最终用户可以查询复杂的数据库，而无需了解SQL或基础数据库架构。由于机器学习方面的显着进步，最近的研究重点一直在神经网络上，以应对蜘蛛等复杂数据集的这一挑战。最近的几个NL到SQL系统在此数据集上获得了令人鼓舞的结果。但是，没有提供源代码或可执行二进制文件的已发布的系统，从用户问题中提取并合并了用于生成SQL语句的值。因此，在实际情况下，这些系统的实际使用尚未得到充分证明。在本文中，我们提出了Valuenet Light and Valuenet - 两个端到端的NL到SQL系统，它们使用挑战性的蜘蛛数据集结合了值。我们方法的主要思想是不仅使用基础数据库中的元数据信息，还要使用基本数据的信息作为我们的神经网络体系结构的输入。特别是，我们提出了一个新颖的体系结构草图，以从用户问题中提取值，并提出问题中未明确提及的可能的候选者。然后，我们使用基于编码器架构的神经模型来综合SQL查询。最后，我们使用执行精度指标评估了蜘蛛挑战的模型，这比大多数挑战参与者使用的更困难的指标。我们的实验评估表明，Valuenet Light和Valuenet分别达到67％和62％准确性的最先进结果，用于从NL转换为SQL，同时结合了值。

Building natural language (NL) interfaces for databases has been a long-standing challenge for several decades. The major advantage of these so-called NL-to-SQL systems is that end-users can query complex databases without the need to know SQL or the underlying database schema. Due to significant advancements in machine learning, the recent focus of research has been on neural networks to tackle this challenge on complex datasets like Spider. Several recent NL-to-SQL systems achieve promising results on this dataset. However, none of the published systems, that provide either the source code or executable binaries, extract and incorporate values from the user questions for generating SQL statements. Thus, the practical use of these systems in a real-world scenario has not been sufficiently demonstrated yet. In this paper we propose ValueNet light and ValueNet -- two end-to-end NL-to-SQL systems that incorporate values using the challenging Spider dataset. The main idea of our approach is to use not only metadata information from the underlying database but also information on the base data as input for our neural network architecture. In particular, we propose a novel architecture sketch to extract values from a user question and come up with possible value candidates which are not explicitly mentioned in the question. We then use a neural model based on an encoder-decoder architecture to synthesize the SQL query. Finally, we evaluate our model on the Spider challenge using the Execution Accuracy metric, a more difficult metric than used by most participants of the challenge. Our experimental evaluation demonstrates that ValueNet light and ValueNet reach state-of-the-art results of 67% and 62% accuracy, respectively, for translating from NL to SQL whilst incorporating values.

下载PDF全文

下载文献需遵守相关版权规定

论文标题