数据流分析启发的深度学习以进行有效的脆弱性检测

论文标题

数据流分析启发的深度学习以进行有效的脆弱性检测

Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection

论文作者

Steenhoek, Benjamin, Gao, Hongyang, Le, Wei

论文摘要

基于深度学习的脆弱性检测表现出了出色的性能，并且在某些研究中，表现优于静态分析工具。但是，表现最高的方法使用了基于令牌的变压器模型，这不是捕获漏洞检测所需的代码语义的最有效效率。经典程序分析技术（例如数据流分析）可以根据其根本原因检测许多类型的错误。在本文中，我们建议将这种基于因果关系的脆弱性检测算法与深度学习相结合，旨在实现更有效和有效的脆弱性检测。具体而言，我们设计了DeepDFA，一个数据流分析启发的图形学习框架和一种嵌入技术，该技术使图形学习能够模拟数据流计算。我们表明，DeepDFA既表现又有效。 DeepDFA的表现优于所有非转化基线。它在9分钟内接受了训练，比表现最高的基线模型快75倍。当仅使用50多个易受攻击和数百个总示例作为培训数据时，该模型保留了与数据集的100％相同的性能。 Deepdfa还概括了DBGbench中的现实世界漏洞；它平均检测到17个漏洞中的8.7个折叠率，并且能够区分修补版和越野车版本，而表现最高的基线模型并未检测到任何漏洞。通过将DeepDFA与大语言模型相结合，我们超过了大VUL数据集上的最新漏洞检测性能，具有96.46 F1分数，97.82精度和95.14召回。我们的复制软件包位于https://doi.org/10.6084/m9.figshare.21225413。

Deep learning-based vulnerability detection has shown great performance and, in some studies, outperformed static analysis tools. However, the highest-performing approaches use token-based transformer models, which are not the most efficient to capture code semantics required for vulnerability detection. Classical program analysis techniques such as dataflow analysis can detect many types of bugs based on their root causes. In this paper, we propose to combine such causal-based vulnerability detection algorithms with deep learning, aiming to achieve more efficient and effective vulnerability detection. Specifically, we designed DeepDFA, a dataflow analysis-inspired graph learning framework and an embedding technique that enables graph learning to simulate dataflow computation. We show that DeepDFA is both performant and efficient. DeepDFA outperformed all non-transformer baselines. It was trained in 9 minutes, 75x faster than the highest-performing baseline model. When using only 50+ vulnerable and several hundreds of total examples as training data, the model retained the same performance as 100% of the dataset. DeepDFA also generalized to real-world vulnerabilities in DbgBench; it detected 8.7 out of 17 vulnerabilities on average across folds and was able to distinguish between patched and buggy versions, while the highest-performing baseline models did not detect any vulnerabilities. By combining DeepDFA with a large language model, we surpassed the state-of-the-art vulnerability detection performance on the Big-Vul dataset with 96.46 F1 score, 97.82 precision, and 95.14 recall. Our replication package is located at https://doi.org/10.6084/m9.figshare.21225413 .

下载PDF全文

下载文献需遵守相关版权规定

论文标题