论文标题
行为多样性自动渗透测试:好奇心驱动的多目标深度强化学习方法
Behaviour-Diverse Automatic Penetration Testing: A Curiosity-Driven Multi-Objective Deep Reinforcement Learning Approach
论文作者
论文摘要
渗透测试在通过模拟真正的主动对手来评估目标网络的安全性方面起着至关重要的作用。深度强化学习(RL)被视为通过降低人类努力和提高可靠性来自动化渗透测试过程的有前途的解决方案。现有的RL解决方案着重于寻找特定的攻击路径来影响目标主机。但是,实际上,需要各种攻击变化来对目标网络的安全级别进行全面评估。因此,攻击剂在穿透网络时必须考虑多个目标。然而,现有文献中没有充分解决这一挑战。为此,我们在多目标增强学习(MORL)框架中制定了自动渗透测试,并提出了Chebyshev分解批评家,以找到在渗透测试中平衡不同目标的各种对手策略。此外,随着代理商始终探讨目标网络的数量增加,使训练过程在许多实际情况下棘手。因此,我们引入了一种基于覆盖范围的掩蔽机制,该机制减少了对先前选择的动作的关注,以帮助代理适应未来的探索。在各种情况下进行的实验评估证明了与适应算法相比,根据多目标学习和绩效效率,我们所提出的方法的优越性。
Penetration Testing plays a critical role in evaluating the security of a target network by emulating real active adversaries. Deep Reinforcement Learning (RL) is seen as a promising solution to automating the process of penetration tests by reducing human effort and improving reliability. Existing RL solutions focus on finding a specific attack path to impact the target hosts. However, in reality, a diverse range of attack variations are needed to provide comprehensive assessments of the target network's security level. Hence, the attack agents must consider multiple objectives when penetrating the network. Nevertheless, this challenge is not adequately addressed in the existing literature. To this end, we formulate the automatic penetration testing in the Multi-Objective Reinforcement Learning (MORL) framework and propose a Chebyshev decomposition critic to find diverse adversary strategies that balance different objectives in the penetration test. Additionally, the number of available actions increases with the agent consistently probing the target network, making the training process intractable in many practical situations. Thus, we introduce a coverage-based masking mechanism that reduces attention on previously selected actions to help the agent adapt to future exploration. Experimental evaluation on a range of scenarios demonstrates the superiority of our proposed approach when compared to adapted algorithms in terms of multi-objective learning and performance efficiency.