集合分类器设计调整为网络入侵检测的数据集特性

论文标题

集合分类器设计调整为网络入侵检测的数据集特性

Ensemble Classifier Design Tuned to Dataset Characteristics for Network Intrusion Detection

论文作者

Zoghi, Zeinab, Serpen, Gursel

论文摘要

基于机器学习的监督方法需要高度定制和微调的方法来提供出色的性能。本文介绍了网络入侵数据集UNSW-NB15的机器学习分类器的数据集驱动的设计和性能评估。数据集的分析表明，它遭受了特征空间中类表示不平衡和类重叠的苦难。我们使用平衡的包装（BB），极端梯度提升（XGBoost）和由Hellinger距离决策树（RF-HDDT）授权的随机森林采用了合奏方法。 BB和XGBoost经过调整以处理不平衡的数据，随机森林（RF）分类器被Hellinger指标补充以解决不平衡问题。提出了两种新算法来解决数据集中的类重叠问题。通过修改三个基本分类器做出的最终分类决策，这两种算法被杠杆化以帮助提高测试数据集的性能，这是使用多数投票组合程序的集合分类器的一部分。对二进制和多类别分类进行了评估所提出的设计。将提出的模型与文献中同一数据集报道的模型进行比较表明，所提出的模型的表现优于其他模型，二进制和多类别分类案例的差距显着。

Machine Learning-based supervised approaches require highly customized and fine-tuned methodologies to deliver outstanding performance. This paper presents a dataset-driven design and performance evaluation of a machine learning classifier for the network intrusion dataset UNSW-NB15. Analysis of the dataset suggests that it suffers from class representation imbalance and class overlap in the feature space. We employed ensemble methods using Balanced Bagging (BB), eXtreme Gradient Boosting (XGBoost), and Random Forest empowered by Hellinger Distance Decision Tree (RF-HDDT). BB and XGBoost are tuned to handle the imbalanced data, and Random Forest (RF) classifier is supplemented by the Hellinger metric to address the imbalance issue. Two new algorithms are proposed to address the class overlap issue in the dataset. These two algorithms are leveraged to help improve the performance of the testing dataset by modifying the final classification decision made by three base classifiers as part of the ensemble classifier which employs a majority vote combiner. The proposed design is evaluated for both binary and multi-category classification. Comparing the proposed model to those reported on the same dataset in the literature demonstrate that the proposed model outperforms others by a significant margin for both binary and multi-category classification cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题