论文标题
BIN2VEC:安全任务的二进制可执行程序的学习表示形式
Bin2vec: Learning Representations of Binary Executable Programs for Security Tasks
论文作者
论文摘要
解决二进制程序分析问题传统上意味着手动定义规则和启发式方法,这是人类分析师的繁琐且耗时的任务。为了提高自动化和可伸缩性,我们根据二进制程序的分布式表示,提出了一个替代方向,这些程序适用于许多下游任务。我们介绍了BIN2VEC,这是一种利用图形卷积网络(GCN)以及计算程序图的新方法,以了解二进制可执行程序的高维表示。我们通过使用我们的表示来解决两个语义上不同的二进制分析任务 - 功能算法分类和漏洞发现来证明这种方法的多功能性。我们将提出的方法与我们自己的强基线以及已发表的结果进行了比较,并证明了两项任务的最先进方法的改进。我们在49191二进制文件上评估了BIN2VEC的功能算法分类任务,并在30个不同的CWE-ID上评估了包括至少100个CVE条目的30个不同的CWE-ID,每个CVE条目分别用于漏洞发现任务。在处理二进制代码时,我们通过将分类错误与基于源代码的Inst2VEC方法相比,通过将分类错误降低40%,从而设定了新的最新结果。对于我们数据集中的几乎每个漏洞类别,我们的预测准确性超过80%(多个类别中的90%以上)。
Tackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time-consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks - functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code-based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).