自适应对话框策略学习与事后见解和用户建模

论文标题

自适应对话框策略学习与事后见解和用户建模

Adaptive Dialog Policy Learning with Hindsight and User Modeling

论文作者

Cao, Yan, Lu, Keting, Chen, Xiaoping, Zhang, Shiqi

论文摘要

强化学习方法已用于从基于语言的互动经验中计算对话策略。效率在对话政策学习中尤其重要，因为与人互动的成本很大，并且从低质量的对话中的用户体验非常差。为了提高对话政策学习的效率，我们开发了LHUA算法（以事后看来，用户建模和适应性学习），这是首次使对话框能够从模拟和真实的用户中以后视而适应对话框。模拟和事后看来，对话框分别为对话剂提供了更多的经验和更多（积极的）增援。实验结果表明，在成功率和政策质量上，LHUA优于文献中的竞争基线，包括其无仿真，无适应和无意识的对应物。

Reinforcement learning methods have been used to compute dialog policies from language-based interaction experiences. Efficiency is of particular importance in dialog policy learning, because of the considerable cost of interacting with people, and the very poor user experience from low-quality conversations. Aiming at improving the efficiency of dialog policy learning, we develop algorithm LHUA (Learning with Hindsight, User modeling, and Adaptation) that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Simulation and hindsight provide the dialog agent with more experience and more (positive) reinforcements respectively. Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature, including its no-simulation, no-adaptation, and no-hindsight counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题