论文标题
层次变压器针对任务的对话框系统
Hierarchical Transformer for Task Oriented Dialog Systems
论文作者
论文摘要
由于RNN和Transformer模型在问题回答和摘要等任务中,对话系统的生成模型引起了极大的兴趣。尽管对话框响应的任务通常被视为序列到序列(SEQ2SEQ)问题,但过去的研究人员发现,使用标准SEQ2SEQ模型训练对话系统的挑战。因此,为了帮助模型学习有意义的话语和对话水平的特征,Sordoni等人。 (2015b); Serban等。 (2016年)提出了层次RNN体系结构,后来由其他几个基于RNN的对话系统采用。由于最近基于变压器的模型在SEQ2SEQ问题上主导了问题,因此自然要问的是层次结构概念在基于变压器的对话框系统中的适用性。在本文中,我们通过使用特殊设计的注意力掩码和位置编码来,建议将标准变压器变成任何分层编码器,包括HRED和HIBERT类似于模型,并展示如何将标准变压器变成任何分层编码器,包括Hred和Hibert类似模型。我们证明,通过广泛的实验,层次编码有助于更好地了解基于变形金属模型的上下文的自然语言理解。
Generative models for dialog systems have gained much interest because of the recent success of RNN and Transformer based models in tasks like question answering and summarization. Although the task of dialog response generation is generally seen as a sequence-to-sequence (Seq2Seq) problem, researchers in the past have found it challenging to train dialog systems using the standard Seq2Seq models. Therefore, to help the model learn meaningful utterance and conversation level features, Sordoni et al. (2015b); Serban et al. (2016) proposed Hierarchical RNN architecture, which was later adopted by several other RNN based dialog systems. With the transformer-based models dominating the seq2seq problems lately, the natural question to ask is the applicability of the notion of hierarchy in transformer based dialog systems. In this paper, we propose a generalized framework for Hierarchical Transformer Encoders and show how a standard transformer can be morphed into any hierarchical encoder, including HRED and HIBERT like models, by using specially designed attention masks and positional encodings. We demonstrate that Hierarchical Encoding helps achieve better natural language understanding of the contexts in transformer-based models for task-oriented dialog systems through a wide range of experiments.