论文标题
贝叶斯Stackelberg Markov游戏中的多项式强化学习,用于自适应移动目标防御
Multi-agent Reinforcement Learning in Bayesian Stackelberg Markov Games for Adaptive Moving Target Defense
论文作者
论文摘要
网络安全领域主要是一款猫咪游戏,因为发现新的攻击引起了人们的注意。为了消除攻击者对侦察的优势,研究人员提出了主动的防御方法,例如移动目标防御(MTD)。为了找到良好的运动策略,研究人员将MTD建模为防守者和网络对抗者之间的领导者追随者游戏。我们认为,当关于理性对手和产量亚最佳运动策略的不完整信息时,现有模型在顺序设置中不足。此外,尽管在连续设置的网络安全设置中存在着一系列关于学习防御政策的工作,但由于信息不完整而引起的可伸缩性问题,它们要么不受欢迎,要么倾向于忽略对手的战略性,从而简化了使用单个Ans-Ans-Ans-Ans-Aren-Fornforceent强化学习技术的方案。为了解决这些问题,我们提出(1)一种统一的游戏理论模型,称为贝叶斯Stackelberg Markov Games(BSMGS),可以对攻击者类型的不确定性和MTD系统的细微差别进行建模,以及(2)贝叶斯强大的Stackelberg Q-learning(BSS-Q)方法,可以通过互动,可以进行互动,以实现互动,以实现互动,以实现互动的行动。我们将BSMG置于不完整的Markov游戏的景观中,并在其中表征了强大的Stackelberg平衡(SSE)概念。我们表明,我们的学习方法会融合到BSMG的SSE,然后强调说,学识渊博的运动政策(1)改善了MTD中的最先进的网络应用程序安全性,(2)即使没有关于对手的不完整信息,即使没有有关对手的不完整信息,也将有关对手的不完整信息收集到最佳策略。
The field of cybersecurity has mostly been a cat-and-mouse game with the discovery of new attacks leading the way. To take away an attacker's advantage of reconnaissance, researchers have proposed proactive defense methods such as Moving Target Defense (MTD). To find good movement strategies, researchers have modeled MTD as leader-follower games between the defender and a cyber-adversary. We argue that existing models are inadequate in sequential settings when there is incomplete information about a rational adversary and yield sub-optimal movement strategies. Further, while there exists an array of work on learning defense policies in sequential settings for cyber-security, they are either unpopular due to scalability issues arising out of incomplete information or tend to ignore the strategic nature of the adversary simplifying the scenario to use single-agent reinforcement learning techniques. To address these concerns, we propose (1) a unifying game-theoretic model, called the Bayesian Stackelberg Markov Games (BSMGs), that can model uncertainty over attacker types and the nuances of an MTD system and (2) a Bayesian Strong Stackelberg Q-learning (BSS-Q) approach that can, via interaction, learn the optimal movement policy for BSMGs within a reasonable time. We situate BSMGs in the landscape of incomplete-information Markov games and characterize the notion of Strong Stackelberg Equilibrium (SSE) in them. We show that our learning approach converges to an SSE of a BSMG and then highlight that the learned movement policy (1) improves the state-of-the-art in MTD for web-application security and (2) converges to an optimal policy in MTD domains with incomplete information about adversaries even when prior information about rewards and transitions is absent.