论文标题

停止:用于口语任务语义解析的数据集

STOP: A dataset for Spoken Task Oriented Semantic Parsing

论文作者

Tomasello, Paden, Shrivastava, Akshat, Lazar, Daniel, Hsu, Po-Chun, Le, Duc, Sagar, Adithya, Elkahky, Ali, Copet, Jade, Hsu, Wei-Ning, Adi, Yossi, Algayres, Robin, Nguyen, Tu Ahn, Dupoux, Emmanuel, Zettlemoyer, Luke, Mohamed, Abdelrahman

论文摘要

端到端口语理解(SLU)使用单个模型直接从音频预测意图。它有望通过利用中间文本表示中丢失的声学信息来提高助手系统的性能,并防止自动语音识别(ASR)中的级联错误。此外,在部署助手系统时,拥有一个统一模型具有效率优势。但是,具有语义分析标签的公共音频数据集数量有限,这阻碍了该领域的研究进展。在本文中,我们发布了以任务为导向的语义解析(Stop)数据集,该数据集是公开可用的最大,最复杂的SLU数据集。此外,我们定义了低资源拆分,以建立有限的标记数据时改善SLU的基准。此外,除了人体录制的音频外,我们还发布了TTS生成的版本,以基于端到端SLU系统适应的低资源域适应性的性能。最初的实验表明,端到端SLU模型的性能比其级联的同行差一些,我们希望这会鼓励未来的工作朝这个方向发展。

End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems. Initial experimentation show end-to-end SLU models performing slightly worse than their cascaded counterparts, which we hope encourages future work in this direction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源