自动发现文本话语的新颖意图和域

论文标题

自动发现文本话语的新颖意图和域

Automatic Discovery of Novel Intents & Domains from Text Utterances

论文作者

Vedula, Nikhita, Gupta, Rahul, Alok, Aman, Sridhar, Mukund

论文摘要

自然语言理解（NLU）的主要任务之一是认识到用户口语和书面语言的意图和领域。大多数现有的研究都将其作为监督分类问题提出，以封闭世界的假设，即要识别的域或意图是预先定义或事先知道的。然而，现实世界的应用程序越来越多地遇到具有新出现的意图和域的动态，快速发展的环境，在模型培训期间尚无信息。我们提出了一个新颖的框架Advin，以自动发现大量未标记数据的新领域和意图。我们首先采用开放分类模型来确定所有可能由新意组成的话语。接下来，我们通过成对的边距损耗函数构建知识传输组件。它学习了歧视性的深层特征，以将话语归为一致，并以无监督的方式发现其中的多个潜在意图类别。最终，我们将相互关联的意图与域相关联，形成了意图域分类学。 Advin在三个基准数据集上的表现明显优于基准，以及来自商业语音动力代理的真实用户话语。

One of the primary tasks in Natural Language Understanding (NLU) is to recognize the intents as well as domains of users' spoken and written language utterances. Most existing research formulates this as a supervised classification problem with a closed-world assumption, i.e. the domains or intents to be identified are pre-defined or known beforehand. Real-world applications however increasingly encounter dynamic, rapidly evolving environments with newly emerging intents and domains, about which no information is known during model training. We propose a novel framework, ADVIN, to automatically discover novel domains and intents from large volumes of unlabeled data. We first employ an open classification model to identify all utterances potentially consisting of a novel intent. Next, we build a knowledge transfer component with a pairwise margin loss function. It learns discriminative deep features to group together utterances and discover multiple latent intent categories within them in an unsupervised manner. We finally hierarchically link mutually related intents into domains, forming an intent-domain taxonomy. ADVIN significantly outperforms baselines on three benchmark datasets, and real user utterances from a commercial voice-powered agent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题