选择性信用分配

论文标题

选择性信用分配

Selective Credit Assignment

论文作者

Chelu, Veronica, Borsa, Diana, Precup, Doina, van Hasselt, Hado

论文摘要

有效的信用分配对于预测和控制设置中的增强学习算法至关重要。我们描述了有关选择性信用分配的时间差异算法的统一观点。这些选择性算法应用权重来量化学习更新的贡献。我们提供了将权重应用于基于价值的学习和计划算法的见解，并描述了它们在调解预测和控制中的向后信用分配中的作用。在这个领域中，我们确定了一些可以选择性分配信用为特殊情况的现有在线学习算法，并添加了新算法，这些算法会及时反映，从而允许分配信用额外的信用，从而分配了信用范围和非政策范围。

Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings. We describe a unified view on temporal-difference algorithms for selective credit assignment. These selective algorithms apply weightings to quantify the contribution of learning updates. We present insights into applying weightings to value-based learning and planning algorithms, and describe their role in mediating the backward credit distribution in prediction and control. Within this space, we identify some existing online learning algorithms that can assign credit selectively as special cases, as well as add new algorithms that assign credit backward in time counterfactually, allowing credit to be assigned off-trajectory and off-policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题