使用强化学习策略算法改进序列到序列NLP模型

论文标题

使用强化学习策略算法改进序列到序列NLP模型

Improving a sequence-to-sequence nlp model using a reinforcement learning policy algorithm

论文作者

Ismail, Jabri, Ahmed, Aboulbichr, Aziza, El ouaazizi

论文摘要

如今，当前的对话生成（聊天机器人）的神经网络模型显示了为聊天代理产生答案的巨大希望。但是它们是短视的，因为他们一次预测一种话语，同时忽略了他们对未来结果的影响。建模对话的未来方向对于产生连贯，有趣的对话至关重要，这是导致依靠强化学习的传统NLP对话模型的需求。在本文中，我们通过使用深度强化学习来预测聊天机器人对话中的未来奖励来解释如何结合这些目标。该模型模拟了两种虚拟代理之间的对话，并使用用于奖励序列的策略梯度方法，这些序列表现出三个有用的对话特征：非正式性，连贯性和响应的简单性（与前瞻性功能相关）。我们根据人类的多样性，长度和复杂性来评估我们的模型。在对话模拟中，评估表明，所提出的模型会产生更多的交互式响应，并鼓励更持续的成功对话。这项工作纪念基于对话的长期成功发展神经对话模型的初步步骤。

Nowadays, the current neural network models of dialogue generation(chatbots) show great promise for generating answers for chatty agents. But they are short-sighted in that they predict utterances one at a time while disregarding their impact on future outcomes. Modelling a dialogue's future direction is critical for generating coherent, interesting dialogues, a need that has led traditional NLP dialogue models that rely on reinforcement learning. In this article, we explain how to combine these objectives by using deep reinforcement learning to predict future rewards in chatbot dialogue. The model simulates conversations between two virtual agents, with policy gradient methods used to reward sequences that exhibit three useful conversational characteristics: the flow of informality, coherence, and simplicity of response (related to forward-looking function). We assess our model based on its diversity, length, and complexity with regard to humans. In dialogue simulation, evaluations demonstrated that the proposed model generates more interactive responses and encourages a more sustained successful conversation. This work commemorates a preliminary step toward developing a neural conversational model based on the long-term success of dialogues.

下载PDF全文

下载文献需遵守相关版权规定

论文标题