在可观察到的环境中安全控制以满足LTL规格

论文标题

在可观察到的环境中安全控制以满足LTL规格

Secure Control in Partially Observable Environments to Satisfy LTL Specifications

论文作者

Ramasubramanian, Bhaskar, Niu, Luyao, Clark, Andrew, Bushnell, Linda, Poovendran, Radha

论文摘要

本文研究了在存在对手的情况下，必须在部分可观察到的环境中满足时间逻辑规范的代理的控制策略的综合。代理（防御者）与对手的相互作用被建模为部分可观察到的随机游戏。目标是生成辩护策略，以最大程度地满足任何对手策略下给定的时间逻辑规范的满意度。搜索策略仅限于有限状态控制器的空间，这导致了确定策略的可行方法。我们将规范的满意度与马尔可夫链的复发状态达到（子集的一部分）。我们提出了一种算法，以确定一组固定尺寸的防御者和对手有限状态控制器，该状态尺寸将满足时间逻辑规范，并证明它是正确的。然后，我们提出了一种价值算法，以最大程度地提高满足固定尺寸的有限状态控制器下时间逻辑规范的可能性。最后，我们将此设置扩展到可以增加防御者有限状态控制器的大小以提高满意度概率的情况。我们用一个例子说明了我们的方法。

This paper studies the synthesis of control policies for an agent that has to satisfy a temporal logic specification in a partially observable environment, in the presence of an adversary. The interaction of the agent (defender) with the adversary is modeled as a partially observable stochastic game. The goal is to generate a defender policy to maximize satisfaction of a given temporal logic specification under any adversary policy. The search for policies is limited to the space of finite state controllers, which leads to a tractable approach to determine policies. We relate the satisfaction of the specification to reaching (a subset of) recurrent states of a Markov chain. We present an algorithm to determine a set of defender and adversary finite state controllers of fixed sizes that will satisfy the temporal logic specification, and prove that it is sound. We then propose a value-iteration algorithm to maximize the probability of satisfying the temporal logic specification under finite state controllers of fixed sizes. Lastly, we extend this setting to the scenario where the size of the finite state controller of the defender can be increased to improve the satisfaction probability. We illustrate our approach with an example.

下载PDF全文

下载文献需遵守相关版权规定

论文标题