论文标题

熵提出的熵登记的加固学习以及从政策梯度到Q学习的连续路径

Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

论文作者

Lee, Donghoon

论文摘要

众所周知,奖励的熵增加是为了软化贪婪的Argmax政策,以使其对软马克斯政策。熵增强已重新制定,并导致以KL差异形式引入目标函数的额外熵术语以正规化优化过程。它产生了一项政策,该政策可以单调地改善,同时从当前的政策到SoftMax贪婪政策。该策略用于构建连续的参数化算法,该算法同时优化策略和Q功能,其极限限制分别对应于策略梯度和Q学习。实验表明,可以使用中间算法会获得性能增长。

Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy which monotonically improves while interpolating from the current policy to the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源