以相反的代理意识学习目标对话政策

论文标题

以相反的代理意识学习目标对话政策

Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness

论文作者

Zhang, Zheng, Liao, Lizi, Zhu, Xiaoyan, Chua, Tat-Seng, Liu, Zitao, Huang, Yan, Huang, Minlie

论文摘要

大多数现有的面向目标对话政策学习的方法都使用了强化学习，该学习的重点是目标代理政策，并将相反的代理政策视为环境的一部分。尽管在现实世界中，相反代理的行为经常表现出某些模式或基础隐藏的政策，目标代理可以推断和利用，以促进其自己的决策。该策略在人类心理模拟中很常见，首先要对特定的动作和可能的结果进行成像。因此，我们建议在面向目标的对话中为政策学习提供相反的行为意识框架。我们从其行为中估算了相反的代理政策，并利用此估算作为目标政策的一部分来改善目标代理。我们在合作和竞争性对话任务上评估我们的模型，显示出优于最先进的基线的表现。

Most existing approaches for goal-oriented dialogue policy learning used reinforcement learning, which focuses on the target agent policy and simply treat the opposite agent policy as part of the environment. While in real-world scenarios, the behavior of an opposite agent often exhibits certain patterns or underlies hidden policies, which can be inferred and utilized by the target agent to facilitate its own decision making. This strategy is common in human mental simulation by first imaging a specific action and the probable results before really acting it. We therefore propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy. We evaluate our model on both cooperative and competitive dialogue tasks, showing superior performance over state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题