元对话政策学习

论文标题

元对话政策学习

Meta Dialogue Policy Learning

论文作者

Xu, Yumo, Zhu, Chenguang, Peng, Baolin, Zeng, Michael

论文摘要

对话策略决定了代理的下一个步骤操作，因此是对话系统的核心。但是，当少量数据迁移到新的域时，由于与新环境的相互作用不足，策略模型无法适应。我们建议深层可转移的Q-network（DTQN）利用域之间的可共享低级信号，例如对话行为和插槽。我们将状态和动作表示空间分解为与这些低级组件相对应的特征子空间，以促进跨域知识转移。此外，我们将DTQN嵌入了元学习框架中，并通过双重复制机制引入了元数据，以实现有效的非政策训练和适应。在实验中，我们的模型在多域对话数据集Multiwoz 2.0上以成功率和对话效率的效率优于基线模型。

Dialog policy determines the next-step actions for agents and hence is central to a dialogue system. However, when migrated to novel domains with little data, a policy model can fail to adapt due to insufficient interactions with the new environment. We propose Deep Transferable Q-Network (DTQN) to utilize shareable low-level signals between domains, such as dialogue acts and slots. We decompose the state and action representation space into feature subspaces corresponding to these low-level components to facilitate cross-domain knowledge transfer. Furthermore, we embed DTQN in a meta-learning framework and introduce Meta-DTQN with a dual-replay mechanism to enable effective off-policy training and adaptation. In experiments, our model outperforms baseline models in terms of both success rate and dialogue efficiency on the multi-domain dialogue dataset MultiWOZ 2.0.

下载PDF全文

下载文献需遵守相关版权规定

论文标题