评估LSTM的不同超参数对ID的相对重要性

论文标题

评估LSTM的不同超参数对ID的相对重要性

Assessment of the Relative Importance of different hyper-parameters of LSTM for an IDS

论文作者

Sewak, Mohit, Sahay, Sanjay K., Rathore, Hemant

论文摘要

像LSTM这样的经常性深度学习语言模型通常用于为高价值资产提供高级网络防御。使用LSTM网络进行恶意软件检测的基本假设是，恶意软件的OP代码序列可以视为（口语）语言表示。任何口语语言（单词/句子的序列）与机器语言（OP编码序列）之间存在差异。在本文中，我们证明，由于这些固有的差异，其默认配置为口语进行了调整的LSTM模型，除非对网络的基本超级参数进行适当调整，否则可能无法很好地检测恶意软件（使用其OP代码序列）。在此过程中，我们还确定了使用其OP代码序列表示应用于恶意软件检测的LSTM网络的所有不同超参数的相对重要性。我们尝试了LSTM网络的不同配置，并更改了超参数，例如嵌入式大小，隐藏层的数量，隐藏层中的LSTM-UNIT数量，输入 - 矢量的修剪/填充/填充长度，激活功能功能和批量。我们发现，由于恶意软件/机器语言的增强复杂性，为入侵检测系统配置的LSTM网络的性能非常敏感，对隐藏数，输入序列长度以及对激活功能的选择。同样，对于（口语）语言模型，复发架构by-far的表现优于他们的非交流对应物。因此，我们还评估了诸如LSTM之类的顺序DL体系结构与MLP-DNN（例如恶意软件检测目的）等非顺序对应物相比。

Recurrent deep learning language models like the LSTM are often used to provide advanced cyber-defense for high-value assets. The underlying assumption for using LSTM networks for malware-detection is that the op-code sequence of malware could be treated as a (spoken) language representation. There are differences between any spoken-language (sequence of words/sentences) and the machine-language (sequence of op-codes). In this paper, we demonstrate that due to these inherent differences, an LSTM model with its default configuration as tuned for a spoken-language, may not work well to detect malware (using its op-code sequence) unless the network's essential hyper-parameters are tuned appropriately. In the process, we also determine the relative importance of all the different hyper-parameters of an LSTM network as applied to malware detection using their op-code sequence representations. We experimented with different configurations of LSTM networks, and altered hyper-parameters like the embedding-size, number of hidden layers, number of LSTM-units in a hidden layer, pruning/padding-length of the input-vector, activation-function, and batch-size. We discovered that owing to the enhanced complexity of the malware/machine-language, the performance of an LSTM network configured for an Intrusion Detection System, is very sensitive towards the number-of-hidden-layers, input sequence-length, and the choice of the activation-function. Also, for (spoken) language-modeling, the recurrent architectures by-far outperform their non-recurrent counterparts. Therefore, we also assess how sequential DL architectures like the LSTM compare against their non-sequential counterparts like the MLP-DNN for the purpose of malware-detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题