论文标题
使用在线深度学习从未标记的网络流量数据中过滤DDOS攻击
Filtering DDoS Attacks from Unlabeled Network Traffic Data Using Online Deep Learning
论文作者
论文摘要
DDOS攻击是简单,有效的,即使经过二十年来,仍然构成了重大威胁。鉴于机器学习的最新成功,有趣的是研究如何利用深度学习来过滤应用程序层攻击请求。由于不断变化的概况,缺乏标签数据以及在线环境中的约束,采用深度学习解决方案面临挑战。离线无监督的学习方法可以通过从正常流量$ {\ Mathcal n} $中学习异常检测器$ n $来避开这些障碍。但是,异常检测并不能利用在攻击期间获得的信息,并且它们的性能通常并不令人满意。在本文中,我们提出了两个框架,这些框架既利用历史$ {\ Mathcal n} $和混合物$ {\ Mathcal M} $在攻击期间获得的框架,包括未标记的请求。我们还引入了一个机器学习优化问题,旨在使用$ {\ Mathcal n} $和$ {\ Mathcal M} $筛选攻击。首先,我们提出的方法受统计方法的启发,扩展了无监督的异常检测器$ n $,以使用估计的条件概率分布来解决该问题。我们采用转移学习将$ n $应用于$ {\ Mathcal n} $,$ {\ Mathcal M} $分别有效地将结果结合起来,结合了结果以获得在线学习者。其次,我们制定了一种特定的损失功能,更适合深度学习,并使用迭代培训在线环境中解决该功能。与基线检测方法相比,在公开可用的数据集中,我们的在线学习者的假阳性率提高了$ 99.3 \%$。在离线环境中,我们的方法与经过标记数据培训的分类器具有竞争力。
DDoS attacks are simple, effective, and still pose a significant threat even after more than two decades. Given the recent success in machine learning, it is interesting to investigate how we can leverage deep learning to filter out application layer attack requests. There are challenges in adopting deep learning solutions due to the ever-changing profiles, the lack of labeled data, and constraints in the online setting. Offline unsupervised learning methods can sidestep these hurdles by learning an anomaly detector $N$ from the normal-day traffic ${\mathcal N}$. However, anomaly detection does not exploit information acquired during attacks, and their performance typically is not satisfactory. In this paper, we propose two frameworks that utilize both the historic ${\mathcal N}$ and the mixture ${\mathcal M}$ traffic obtained during attacks, consisting of unlabeled requests. We also introduce a machine learning optimization problem that aims to sift out the attacks using ${\mathcal N}$ and ${\mathcal M}$. First, our proposed approach, inspired by statistical methods, extends an unsupervised anomaly detector $N$ to solve the problem using estimated conditional probability distributions. We adopt transfer learning to apply $N$ on ${\mathcal N}$ and ${\mathcal M}$ separately and efficiently, combining the results to obtain an online learner. Second, we formulate a specific loss function more suited for deep learning and use iterative training to solve it in the online setting. On publicly available datasets, our online learners achieve a $99.3\%$ improvement on false-positive rates compared to the baseline detection methods. In the offline setting, our approaches are competitive with classifiers trained on labeled data.