数据不可知论的基于罗伯塔的自然语言至SQL查询产生

论文标题

数据不可知论的基于罗伯塔的自然语言至SQL查询产生

Data Agnostic RoBERTa-based Natural Language to SQL Query Generation

论文作者

Pal, Debaditya, Sharma, Harsh, Chaudhari, Kaustubh

论文摘要

关系数据库是在现代世界中存储大量数据的最广泛使用的架构之一。但是，这些数据库与普通用户之间存在障碍。用户通常缺乏与数据库交互所需的查询语言的知识。 NL2SQL任务旨在通过将自然语言问题转换为有效的SQL查询来找到深度学习方法来解决此问题。鉴于某些数据库的敏感性以及对数据隐私的日益增长的需求，我们提出了一种具有数据隐私的方法。我们已经将Roberta的嵌入式和数据不足的知识向量传递到了基于LSTM的子模型中，以预测最终查询。尽管我们尚未达到最新结果的状态，但我们已经从模型的训练中消除了对表数据的需求，并达到了测试集的执行精度为76.7％。通过培训时消除表数据依赖性，我们创建了一个基于自然语言问题和表格架构的模型，能够零射击学习。

Relational databases are among the most widely used architectures to store massive amounts of data in the modern world. However, there is a barrier between these databases and the average user. The user often lacks the knowledge of a query language such as SQL required to interact with the database. The NL2SQL task aims at finding deep learning approaches to solve this problem by converting natural language questions into valid SQL queries. Given the sensitive nature of some databases and the growing need for data privacy, we have presented an approach with data privacy at its core. We have passed RoBERTa embeddings and data-agnostic knowledge vectors into LSTM based submodels to predict the final query. Although we have not achieved state of the art results, we have eliminated the need for the table data, right from the training of the model, and have achieved a test set execution accuracy of 76.7%. By eliminating the table data dependency while training we have created a model capable of zero shot learning based on the natural language question and table schema alone.

下载PDF全文

下载文献需遵守相关版权规定

论文标题