部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

GenTUS: Simulating User Behaviour and Language in Task-oriented Dialogues with Generative Transformers

论文作者

Lin, Hsien-Chin, Geishauser, Christian, Feng, Shutong, Lubis, Nurul, van Niekerk, Carel, Heck, Michael, Gašić, Milica

论文摘要

用户模拟器（USS）通常用于通过增强学习训练面向任务的对话系统（DSS）。相互作用通常是在语义层面上以提高效率的，但是从语义动作到自然语言仍然存在差距，这会导致培训和部署环境之间的不匹配。在培训期间，将自然语言生成（NLG）模块与US一起纳入US，可以部分解决此问题。但是，由于US的策略和NLG是单独优化的，因此在给定的情况下，这些模拟的用户话语可能不够自然。在这项工作中，我们提出了一个基于生成变压器的用户模拟器（Gentus）。 Gentus由编码器结构组成，这意味着它可以共同优化用户政策和自然语言。 Gentus既产生语义动作又产生自然语言，从而保留了解释性并增强语言的变化。另外，通过将输入和输出表示为单词序列，并使用大型的预训练语言模型，我们可以在功能表示中实现普遍性。我们通过自动指标和人类评估评估绅士。我们的结果表明，绅士会产生更自然的语言，并能够以零拍的方式转移到看不见的本体论中。此外，通过加强学习为培训专业用户模拟器打开大门，可以进一步塑造其行为。

User simulators (USs) are commonly used to train task-oriented dialogue systems (DSs) via reinforcement learning. The interactions often take place on semantic level for efficiency, but there is still a gap from semantic actions to natural language, which causes a mismatch between training and deployment environment. Incorporating a natural language generation (NLG) module with USs during training can partly deal with this problem. However, since the policy and NLG of USs are optimised separately, these simulated user utterances may not be natural enough in a given context. In this work, we propose a generative transformer-based user simulator (GenTUS). GenTUS consists of an encoder-decoder structure, which means it can optimise both the user policy and natural language generation jointly. GenTUS generates both semantic actions and natural language utterances, preserving interpretability and enhancing language variation. In addition, by representing the inputs and outputs as word sequences and by using a large pre-trained language model we can achieve generalisability in feature representation. We evaluate GenTUS with automatic metrics and human evaluation. Our results show that GenTUS generates more natural language and is able to transfer to an unseen ontology in a zero-shot fashion. In addition, its behaviour can be further shaped with reinforcement learning opening the door to training specialised user simulators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题