熵提出的熵登记的加固学习以及从政策梯度到Q学习的连续路径

论文标题

熵提出的熵登记的加固学习以及从政策梯度到Q学习的连续路径

Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

论文作者

Lee, Donghoon

论文摘要

众所周知，奖励的熵增加是为了软化贪婪的Argmax政策，以使其对软马克斯政策。熵增强已重新制定，并导致以KL差异形式引入目标函数的额外熵术语以正规化优化过程。它产生了一项政策，该政策可以单调地改善，同时从当前的政策到SoftMax贪婪政策。该策略用于构建连续的参数化算法，该算法同时优化策略和Q功能，其极限限制分别对应于策略梯度和Q学习。实验表明，可以使用中间算法会获得性能增长。

Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy which monotonically improves while interpolating from the current policy to the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题