基于RL的对话管理的专家使用方法

论文标题

基于RL的对话管理的专家使用方法

A Mixture-of-Expert Approach to RL-based Dialogue Management

论文作者

Chow, Yinlam, Tulepbergenov, Aza, Nachum, Ofir, Ryu, MoonKyung, Ghavamzadeh, Mohammad, Boutilier, Craig

论文摘要

尽管语言模型（LMS）的最新进展，但它们在对话管理（DM）问题上的应用和进行丰富对话的能力仍然是一个挑战。我们使用加固学习（RL）来开发对话代理，该对话代理避免了短视（输出通用话语）并最大程度地提高用户满意度。大多数现有的RL方法用于DM在单词级别上训练代理，因此即使在中型词汇量中也必须处理组合复杂的动作空间。结果，即使他们受到预先训练的LM的热情，他们也很难进行成功而引人入胜的对话。为了解决这个问题，我们使用专家语言模型（MOE-LM）的新型混合物开发基于RL专家。我们的MOE方法提供了更大的灵活性，以产生不同意图的明智话语，并允许RL专注于对话级别的DM。我们将其与SOTA基线进行了开放域对话的比较，并在产生的话语的多样性和敏感性和整体DM性能方面都证明了其有效性。

Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction. Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with a combinatorially complex action space even for a medium-size vocabulary. As a result, they struggle to produce a successful and engaging dialogue even if they are warm-started with a pre-trained LM. To address this issue, we develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of {\em specialized} LMs (or experts) capable of generating utterances corresponding to a particular attribute or personality, and (iii) a RL-based DM that performs dialogue planning with the utterances generated by the experts. Our MoE approach provides greater flexibility to generate sensible utterances with different intents and allows RL to focus on conversational-level DM. We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in terms of the diversity and sensibility of the generated utterances and the overall DM performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题