通过提取问题的变形金刚建立真实世界对话corpora的意图景观

论文标题

通过提取问题的变形金刚建立真实世界对话corpora的意图景观

Building the Intent Landscape of Real-World Conversational Corpora with Extractive Question-Answering Transformers

论文作者

Corbeil, Jean-Philippe, Li, Mia Taige, Ghavidel, Hadi Abdi

论文摘要

对于具有客户服务的公司，其对话数据中的映射意图对于基于自然语言理解（NLU）构建应用程序至关重要。但是，尚无既定的自动化技术来收集嘈杂的在线聊天或语音成绩单中的意图。简单的聚类方法不适用于意图对话。为了解决这项意图景观任务，我们提出了一条无监督的管道，从现实世界对话中提取意图和分类。我们的管道地雷意图是具有提取性问题的电气模型，并利用句子的嵌入来应用低级密度群集，然后是顶级层次结构聚类。我们的结果表明，在Squad2数据集上微调的Electra大型模型的概括能力以了解对话。有了正确的提示问题，该模型实现了对意图的语言验证率超过85％。我们此外，从多道数据集中重建了五个域的意图方案，平均召回率为94.3％。

For companies with customer service, mapping intents inside their conversational data is crucial in building applications based on natural language understanding (NLU). Nevertheless, there is no established automated technique to gather the intents from noisy online chats or voice transcripts. Simple clustering approaches are not suited to intent-sparse dialogues. To solve this intent-landscape task, we propose an unsupervised pipeline that extracts the intents and the taxonomy of intents from real-world dialogues. Our pipeline mines intent-span candidates with an extractive Question-Answering Electra model and leverages sentence embeddings to apply a low-level density clustering followed by a top-level hierarchical clustering. Our results demonstrate the generalization ability of an ELECTRA large model fine-tuned on the SQuAD2 dataset to understand dialogues. With the right prompting question, this model achieves a rate of linguistic validation on intent spans beyond 85%. We furthermore reconstructed the intent schemes of five domains from the MultiDoGo dataset with an average recall of 94.3%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题