论文标题
通过高斯流程的信息理论安全探索
Information-Theoretic Safe Exploration with Gaussian Processes
论文作者
论文摘要
我们考虑一项顺序决策任务,不允许我们评估违反先验未知(安全)约束的参数。一种常见的方法是将高斯流程提前放在未知的约束上,并仅允许在具有很高概率安全的区域进行评估。大多数当前方法依赖于域的离散化,不能直接扩展到连续情况。此外,他们利用有关约束的规律性假设的方式引入了额外的关键超参数。在本文中,我们提出了一个信息理论的安全探索标准,该标准直接利用GP后部以确定要评估的最有用的安全参数。我们的方法自然适用于连续域,不需要其他超参数。我们从理论上分析了该方法,并表明我们不会以很高的可能性违反安全约束,并且通过学习任意精度的约束来探索。经验评估表明数据效率和可扩展性提高。
We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an a priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown constraint and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. Our approach is naturally applicable to continuous domains and does not require additional hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we explore by learning about the constraint up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability.