基于样本的基于样本模型的参与者评论，用于交互式对话任务

论文标题

基于样本的基于样本模型的参与者评论，用于交互式对话任务

Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue Task

论文作者

Kudashkina, Katya, Chockalingam, Valliappa, Taylor, Graham W., Bowling, Michael

论文摘要

依靠机器学习的人类计算机互动系统对每天使用数字助手的数百万人的生活至关重要。然而，进一步的进步受数据的可用性和获取新样本的成本的限制。解决此问题的一种方法是提高当前方法的样本效率。作为解决方案路径，我们为交互式对话任务提供了基于模型的增强学习算法。我们以常用的参与者批判性方法为基础，添加了一个环境模型和计划者，该模型和计划者可以增强学习代理以学习环境动态模型。我们的结果表明，在模拟交互式任务的模拟中，与常用的无模型算法的基线相比，我们的算法需要减少样本的70倍，并且渐近地表明了性能更好的2〜倍。此外，我们介绍了计算软计划者政策的新颖贡献，并进一步更新了无模型策略，从而产生了与基于模型的型号相同的计算量较差的无模型代理。这种基于模型的体系结构是一个基础，可以扩展到其他人类计算机交互式任务，从而可以在这个方向上进一步进步。

Human-computer interactive systems that rely on machine learning are becoming paramount to the lives of millions of people who use digital assistants on a daily basis. Yet, further advances are limited by the availability of data and the cost of acquiring new samples. One way to address this problem is by improving the sample efficiency of current approaches. As a solution path, we present a model-based reinforcement learning algorithm for an interactive dialogue task. We build on commonly used actor-critic methods, adding an environment model and planner that augments a learning agent to learn the model of the environment dynamics. Our results show that, on a simulation that mimics the interactive task, our algorithm requires 70 times fewer samples, compared to the baseline of commonly used model-free algorithm, and demonstrates 2~times better performance asymptotically. Moreover, we introduce a novel contribution of computing a soft planner policy and further updating a model-free policy yielding a less computationally expensive model-free agent as good as the model-based one. This model-based architecture serves as a foundation that can be extended to other human-computer interactive tasks allowing further advances in this direction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题