论文标题
Gochat:带有层次增强学习的面向目标的聊天机器人
GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning
论文作者
论文摘要
像人类一样对话的聊天机器人应该以目标为导向(即在对话中有目的),这超出了语言的生成。但是,现有的对话系统通常在很大程度上依赖繁琐的手工制作的规则或昂贵的标签数据集来实现目标。在本文中,我们提出了面向目标的聊天机器人(Gochat),这是一个端到端培训聊天机器人的框架,以最大程度地利用离线多转向对话数据集的长期回报。我们的框架利用了分层增强学习(HRL),在该学习中,高级政策通过确定一些子目标来指导对话实现最终目标,而低级政策通过为响应产生相应的话语来实现子目标。在我们对金融中反欺诈的现实对话数据集的实验中,我们的方法在响应生成质量以及实现目标的成功率上都超过了以前的方法。
A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets to reach the goals. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training chatbots to maximize the longterm return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy guides the conversation towards the final goal by determining some sub-goals, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.