论文标题
Dinerdash Gym:高维行动空间中政策学习的基准
DinerDash Gym: A Benchmark for Policy Learning in High-Dimensional Action Space
论文作者
论文摘要
由于缺乏普遍接受的基准,评估在层次任务的领域具有高维操作空间的策略学习算法的进度是非常艰巨的。在这项工作中,我们提出了一项名为Diner Dash的新的轻巧基准任务,用于评估具有高维操作空间的复杂任务中的性能。与仅具有平坦的目标结构和极少数动作结构的传统Atari游戏相反,提议的基准任务具有层次结构的任务结构,而动作空间的大小为57,因此可以促进复杂任务中的政策学习的发展。最重要的是,我们引入了分解的策略图建模(DPGM),该算法结合了图形建模和深度学习,以允许与基线相比,允许显式域知识嵌入并实现重大改进。在实验中,我们通过特殊设计的模仿算法以及其他流行算法的结果显示了域知识注入的有效性。
It has been arduous to assess the progress of a policy learning algorithm in the domain of hierarchical task with high dimensional action space due to the lack of a commonly accepted benchmark. In this work, we propose a new light-weight benchmark task called Diner Dash for evaluating the performance in a complicated task with high dimensional action space. In contrast to the traditional Atari games that only have a flat structure of goals and very few actions, the proposed benchmark task has a hierarchical task structure and size of 57 for the action space and hence can facilitate the development of policy learning in complicated tasks. On top of that, we introduce Decomposed Policy Graph Modelling (DPGM), an algorithm that combines both graph modelling and deep learning to allow explicit domain knowledge embedding and achieves significant improvement comparing to the baseline. In the experiments, we have shown the effectiveness of the domain knowledge injection via a specially designed imitation algorithm as well as results of other popular algorithms.