恶意软件识别的有效多步框架

论文标题

恶意软件识别的有效多步框架

An Efficient Multi-Step Framework for Malware Packing Identification

论文作者

Kim, Jong-Wouk, Moon, Yang-Sae, Choi, Mi-Jung

论文摘要

恶意软件开发人员使用诸如压缩，加密和混淆等技术的组合来绕过反病毒软件。使用反分析技术的恶意软件可以绕过基于AI的防病毒软件和恶意软件分析工具。因此，对包装文件进行分类是最大的挑战之一。如果恶意软件分类器学习包装工的功能，而不是恶意软件的功能，就会出现问题。用意外错误的数据训练模型会变成中毒攻击，对抗攻击和逃避攻击。因此，研究人员应考虑包装以构建适当的恶意软件分类器模型。在本文中，我们提出了一个多步框架，用于分类和识别包装样本，该样本包括伪最佳的特征选择，基于机器学习的分类器和包装工的识别步骤。在第一步中，我们使用购物车算法和置换重要性来预选重要的20个功能。在第二步中，每个模型都会学习20个预选功能，用于对具有最高性能的包装文件进行分类。结果，XGBOOST以置换重要性了解了XGBoost预先选择的功能，其精度为99.67％，F1分数为99.46％，在曲线（AUC）下的F1分数为99.98％。在第三步中，我们提出了一种新方法，该方法只能识别包装工的样本，以归类为众所周知的包装。

Malware developers use combinations of techniques such as compression, encryption, and obfuscation to bypass anti-virus software. Malware with anti-analysis technologies can bypass AI-based anti-virus software and malware analysis tools. Therefore, classifying pack files is one of the big challenges. Problems arise if the malware classifiers learn packers' features, not those of malware. Training the models with unintended erroneous data turn into poisoning attacks, adversarial attacks, and evasion attacks. Therefore, researchers should consider packing to build appropriate malware classifier models. In this paper, we propose a multi-step framework for classifying and identifying packed samples which consists of pseudo-optimal feature selection, machine learning-based classifiers, and packer identification steps. In the first step, we use the CART algorithm and the permutation importance to preselect important 20 features. In the second step, each model learns 20 preselected features for classifying the packed files with the highest performance. As a result, the XGBoost, which learned the features preselected by XGBoost with the permutation importance, showed the highest performance of any other experiment scenarios with an accuracy of 99.67%, an F1-Score of 99.46%, and an area under the curve (AUC) of 99.98%. In the third step, we propose a new approach that can identify packers only for samples classified as Well-Known Packed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题