在网络安全数据上培训基于量子退火的限制性玻尔兹曼机器

论文标题

在网络安全数据上培训基于量子退火的限制性玻尔兹曼机器

Training a quantum annealing based restricted Boltzmann machine on cybersecurity data

论文作者

Dixit, Vivek, Selvarajan, Raja, Aldwairi, Tamer, Koshka, Yaroslav, Novotny, Mark A., Humble, Travis S., Alam, Muhammad A., Kais, Sabre

论文摘要

我们提出了使用量子计算机的真实应用程序。具体而言，我们使用质量保证应用训练RBM进行网络安全应用。 D-WAVE 2000Q已用于实施质量检查。 RBM对ISCX数据进行了培训，ISCX数据是网络安全的基准数据集。为了进行比较，RBM还使用CD训练。 CD是用于RBM训练的常用方法。我们对ISCX数据的分析表明，数据集是不平衡的。我们提出两个不同的方案，以平衡培训数据集，然后再将其馈送给分类器。第一个方案是基于良性实例的不足采样。不平衡的培训数据集分为五个分别训练的子数据集。然后进行多数投票以获得结果。我们的结果表明，在CD的情况下，大多数投票将分类准确性从90.24％提高到95.68％。对于质量保证，分类精度从74.14％提高到80.04％。在第二个方案中，使用RBM来生成合成数据以平衡训练数据集。我们表明，QA和CD训练的RBM均可用于生成有用的合成数据。平衡培训数据用于评估多个分类器。在调查的分类器中，K-Nearest邻居（KNN）和神经网络（NN）的表现优于其他分类器。它们的准确性为93％。我们的结果表明，可以在64位二进制数据集中对基于质量质量警察的RBM进行训练。说明性的例子表明，将许多实用分类问题迁移到基于质量质量质量质量警定的技术的可能性。此外，我们表明，可以使用RBM生成的合成数据来平衡原始数据集。

We present a real-world application that uses a quantum computer. Specifically, we train a RBM using QA for cybersecurity applications. The D-Wave 2000Q has been used to implement QA. RBMs are trained on the ISCX data, which is a benchmark dataset for cybersecurity. For comparison, RBMs are also trained using CD. CD is a commonly used method for RBM training. Our analysis of the ISCX data shows that the dataset is imbalanced. We present two different schemes to balance the training dataset before feeding it to a classifier. The first scheme is based on the undersampling of benign instances. The imbalanced training dataset is divided into five sub-datasets that are trained separately. A majority voting is then performed to get the result. Our results show the majority vote increases the classification accuracy up from 90.24% to 95.68%, in the case of CD. For the case of QA, the classification accuracy increases from 74.14% to 80.04%. In the second scheme, a RBM is used to generate synthetic data to balance the training dataset. We show that both QA and CD-trained RBM can be used to generate useful synthetic data. Balanced training data is used to evaluate several classifiers. Among the classifiers investigated, K-Nearest Neighbor (KNN) and Neural Network (NN) perform better than other classifiers. They both show an accuracy of 93%. Our results show a proof-of-concept that a QA-based RBM can be trained on a 64-bit binary dataset. The illustrative example suggests the possibility to migrate many practical classification problems to QA-based techniques. Further, we show that synthetic data generated from a RBM can be used to balance the original dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题