通过共同信息正规政策梯度通过隐私受限的政策

论文标题

通过共同信息正规政策梯度通过隐私受限的政策

Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients

论文作者

Cundy, Chris, Desai, Rishi, Ermon, Stefano

论文摘要

随着强化学习技术越来越多地应用于现实世界的决策问题，注意力已转向这些算法如何使用潜在敏感信息。我们考虑培训一项政策的任务，该政策可以最大程度地提高奖励，同时最大程度地减少通过行动披露某些敏感状态变量的披露。我们举例说明该设置如何涵盖隐私决策中的真实世界问题。我们通过基于敏感状态和动作之间的相互信息（MI）引入正规化程序来解决政策梯度框架中的这个问题。我们开发了一个基于模型的随机梯度估计器，以优化隐私受限的策略。我们还讨论了一种替代MI正则器，该MI正则化器可作为我们主MI正则器的上限，并且可以在无模型设置中进行优化，并且可以在具有可区分动力学的环境中使用强大的直接估计器。我们将差异性RL的先前工作与我们的信息披露的共同信息表述进行了对比。实验结果表明，我们的培训方法会导致隐藏敏感状态的政策，即使在挑战高维任务中也是如此。

As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting, and a powerful direct estimator that can be used in an environment with differentiable dynamics. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies that hide the sensitive state, even in challenging high-dimensional tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题