安全学习以进行接近最佳的安排

论文标题

安全学习以进行接近最佳的安排

Safe Learning for Near Optimal Scheduling

论文作者

Busatto-Gaston, Damien, Chakraborty, Debraj, Guha, Shibashis, Pérez, Guillermo A., Raskin, Jean-François

论文摘要

在本文中，我们研究了合成，基于模型的学习和在线抽样技术的组合，以获取可预见的任务调度问题的安全且近乎最佳的调度程序。我们的算法可以处理马尔可夫决策过程（MDP），该过程具有1020个州，无法使用最先进的概率模型检查器来处理。我们可能会提供学习模型的近似正确（PAC）保证。此外，我们将蒙特 - 卡洛树搜索扩展了建议，使用安全游戏计算或使用最早的deadline优先调度程序获得，以安全地在线探索学习的模型。最后，我们从经验上实施并比较了我们的算法与大型任务系统上的屏蔽深度学习。

In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题