深入强化学习的在线安全保证

论文标题

深入强化学习的在线安全保证

Online Safety Assurance for Deep Reinforcement Learning

论文作者

Rotman, Noga H., Schapira, Michael, Tamar, Aviv

论文摘要

最近，深度学习已成功地应用于各种网络问题。一个基本的挑战是，当学习启发系统的运营环境与培训环境不同时，这种系统通常会做出不太明智的决定，从而导致不良的绩效。我们认为，安全部署学习驱动的系统要求能够实时确定系统行为是否连贯，以便在事实并非如此。我们将其称为在线安全保证问题（OSAP）。我们提出了三种量化决策不确定性的方法，这些方法在推断不确定性的信号方面有所不同。我们说明了在拟议的深入强化学习（RL）进行视频流的情况下，在线安全保证的有用性。当操作和培训环境匹配时，视频流的深度RL最好是其他方法，但当两者不同时，它以简单的启发式为主导。我们的初步发现表明，在检测到决策不确定性时过渡到默认政策是享受利用ML所带来的绩效好处而不会损害安全性所带来的绩效收益的关键。

Recently, deep learning has been successfully applied to a variety of networking problems. A fundamental challenge is that when the operational environment for a learning-augmented system differs from its training environment, such systems often make badly informed decisions, leading to bad performance. We argue that safely deploying learning-driven systems requires being able to determine, in real time, whether system behavior is coherent, for the purpose of defaulting to a reasonable heuristic when this is not so. We term this the online safety assurance problem (OSAP). We present three approaches to quantifying decision uncertainty that differ in terms of the signal used to infer uncertainty. We illustrate the usefulness of online safety assurance in the context of the proposed deep reinforcement learning (RL) approach to video streaming. While deep RL for video streaming bests other approaches when the operational and training environments match, it is dominated by simple heuristics when the two differ. Our preliminary findings suggest that transitioning to a default policy when decision uncertainty is detected is key to enjoying the performance benefits afforded by leveraging ML without compromising on safety.

下载PDF全文

下载文献需遵守相关版权规定

论文标题