用线性函数近似进行差异性私人增强学习

论文标题

用线性函数近似进行差异性私人增强学习

Differentially Private Reinforcement Learning with Linear Function Approximation

论文作者

Zhou, Xingyu

论文摘要

在现实世界中的广泛采用中，需要保护用户的敏感和私人信息，以保护有限的马尔可夫决策过程（MDP）在差异隐私（DP）的约束下，我们研究了遗憾的最小化。与仅在表格有限状态，有限型MDP上使用的现有私人RL算法相比，我们迈出了具有较大状态和动作空间的MDP的隐私学习的第一步。具体而言，我们考虑具有线性函数近似（特别是线性混合MDP）的MDP，在关节差异隐私（JDP）的概念下，RL代理负责保护用户的敏感数据。我们设计了两种基于价值迭代和政策优化的私人RL算法，并表明它们在保证隐私保护的同时享有次线性后悔性能。此外，遗憾的界限独立于国家的数量，最多与动作数量进行对数扩展，从而使算法适用于如今的大规模个性化服务。我们的结果是通过在不断变化的正规化器下的线性混合物中学习的一般程序来实现的，这不仅概括了非私人学习的先前结果，而且还可以作为一般私人强化学习的基础。

Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized services, where users' sensitive and private information needs to be protected, we study regret minimization in finite-horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP). Compared to existing private RL algorithms that work only on tabular finite-state, finite-actions MDPs, we take the first step towards privacy-preserving learning in MDPs with large state and action spaces. Specifically, we consider MDPs with linear function approximation (in particular linear mixture MDPs) under the notion of joint differential privacy (JDP), where the RL agent is responsible for protecting users' sensitive data. We design two private RL algorithms that are based on value iteration and policy optimization, respectively, and show that they enjoy sub-linear regret performance while guaranteeing privacy protection. Moreover, the regret bounds are independent of the number of states, and scale at most logarithmically with the number of actions, making the algorithms suitable for privacy protection in nowadays large-scale personalized services. Our results are achieved via a general procedure for learning in linear mixture MDPs under changing regularizers, which not only generalizes previous results for non-private learning, but also serves as a building block for general private reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题