二进制黑盒逃避攻击对基于深度学习的静态恶意软件探测器，具有对抗性字节级语言模型

论文标题

二进制黑盒逃避攻击对基于深度学习的静态恶意软件探测器，具有对抗性字节级语言模型

Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language Model

论文作者

Ebrahimi, Mohammadreza, Zhang, Ning, Hu, James, Raza, Muhammad Taqi, Chen, Hsinchun

论文摘要

反恶意软件引擎是针对恶意软件的第一道防线。虽然广泛使用，但基于特征工程的反恶意软件发动机容易受到看不见的（零日）攻击的攻击。最近，基于深度学习的静态反恶意软件探测器在识别看不见的攻击方面取得了成功，而无需进行功能工程和动态分析。但是，这些探测器容易受到带有轻微扰动的恶意软件变体，称为对抗性示例。产生有效的对抗示例对于揭示此类系统的脆弱性很有用。当前启动此类攻击的方法需要访问目标反恶意软件模型的规格，反恶意软件响应的置信度得分或动态恶意软件分析，这些分析是不现实的或昂贵的。我们提出了Malrnn，这是一种新型的基于深度学习的方法，可以自动生成回避的恶意软件变体，而无需任何这些限制。我们的方法具有对抗性示例生成过程，该过程通过生成序列到序列复发性神经网络来学习语言模型，以增强恶意软件二进制文件。 Malrnn有效地逃避了最近的三个基于深度学习的恶意软件探测器，并且优于当前基准方法。讨论了将MALRNN应用于具有八个恶意软件类别的真实数据集的发现。

Anti-malware engines are the first line of defense against malicious software. While widely used, feature engineering-based anti-malware engines are vulnerable to unseen (zero-day) attacks. Recently, deep learning-based static anti-malware detectors have achieved success in identifying unseen attacks without requiring feature engineering and dynamic analysis. However, these detectors are susceptible to malware variants with slight perturbations, known as adversarial examples. Generating effective adversarial examples is useful to reveal the vulnerabilities of such systems. Current methods for launching such attacks require accessing either the specifications of the targeted anti-malware model, the confidence score of the anti-malware response, or dynamic malware analysis, which are either unrealistic or expensive. We propose MalRNN, a novel deep learning-based approach to automatically generate evasive malware variants without any of these restrictions. Our approach features an adversarial example generation process, which learns a language model via a generative sequence-to-sequence recurrent neural network to augment malware binaries. MalRNN effectively evades three recent deep learning-based malware detectors and outperforms current benchmark methods. Findings from applying our MalRNN on a real dataset with eight malware categories are discussed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题