通用对话管理的分布式结构化参与者批判性增强学习

论文标题

通用对话管理的分布式结构化参与者批判性增强学习

Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management

论文作者

Chen, Zhi, Chen, Lu, Liu, Xiaoyuan, Yu, Kai

论文摘要

面向任务的口语对话系统（SDS）旨在帮助人类用户完成特定任务（例如酒店预订）。对话管理是SDS的核心部分。对话管理中有两个主要任务：对话信念状态跟踪（总结对话历史记录）和对话决策（决定如何回复用户）。在这项工作中，我们只专注于设计一项政策，该策略选择了对话行动以响应用户。顺序系统决策过程可以抽象成部分可观察到的马尔可夫决策过程（POMDP）。在此框架下，强化学习方法可用于自动化策略优化。在过去的几年中，有许多深入的增强学习（DRL）算法，这些算法使用神经网络（NN）作为函数近似器，以进行对话政策。

The task-oriented spoken dialogue system (SDS) aims to assist a human user in accomplishing a specific task (e.g., hotel booking). The dialogue management is a core part of SDS. There are two main missions in dialogue management: dialogue belief state tracking (summarising conversation history) and dialogue decision-making (deciding how to reply to the user). In this work, we only focus on devising a policy that chooses which dialogue action to respond to the user. The sequential system decision-making process can be abstracted into a partially observable Markov decision process (POMDP). Under this framework, reinforcement learning approaches can be used for automated policy optimization. In the past few years, there are many deep reinforcement learning (DRL) algorithms, which use neural networks (NN) as function approximators, investigated for dialogue policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题