语音到SQL：从自然语言问题中迈向语音驱动的SQL查询产生

论文标题

语音到SQL：从自然语言问题中迈向语音驱动的SQL查询产生

Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question

论文作者

Song, Yuanfeng, Wong, Raymond Chi-Wing, Zhao, Xuefang, Jiang, Di

论文摘要

基于言语的输入在我们日常生活中智能手机和平板电脑的普及而获得了巨大的动力，因为声音是人力计算机互动的最简单，最有效的方式。本文致力于设计更有效的基于语音的接口，以查询关系数据库中的结构化数据。我们首先确定了一个名为Speech-SQL的新任务，该任务旨在了解人类语音传达的信息，并将其直接转化为结构化查询语言（SQL）语句。对此问题的幼稚解决方案可以以级联的方式起作用，即自动语音识别（ASR）组件，然后是文本到SQL组件。但是，它需要高质量的ASR系统，并且还遭受了两个组件之间的错误复杂问题，从而导致性能有限。为了应对这些挑战，我们进一步提出了一种名为Speechsqlnet的新型端到端神经架构，以将人类言语直接转化为SQL查询，而无需外部ASR步骤。 Speechsqlnet具有充分利用语音中提供的丰富语言信息的优势。据我们所知，这是基于任意自然语言问题直接合成SQL的首次尝试，而不是基于自然语言的SQL或其具有有限SQL语法的SQL的自然版本。为了验证提出的问题和模型的有效性，我们通过对重新使用的文本到SQL数据集进行了构建名为Secemql的数据集。该数据集的广泛实验评估表明，语音SQLNET可以直接从人类语音中综合高质量的SQL查询，从而优于各种竞争性的对应物以及级联的方法，就精确的精确匹配而言。

Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most easiest and efficient way for human-computer interaction. This paper works towards designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition (ASR) component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we further propose a novel end-to-end neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on arbitrary natural language questions, rather than a natural language-based version of SQL or its variants with a limited SQL grammar. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely-used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题