基于深度学习的脆弱性检测：我们还在吗？

论文标题

基于深度学习的脆弱性检测：我们还在吗？

Deep Learning based Vulnerability Detection: Are We There Yet?

论文作者

Chakraborty, Saikat, Krishna, Rahul, Ding, Yangruibo, Ray, Baishakhi

论文摘要

软件漏洞的自动检测是软件安全性的基本问题。现有的程序分析技术要么具有高误报或假否定性。深度学习的最新进展（DL）导致对将DL应用于自动漏洞检测的兴趣激增。最近的一些研究表明，在检测漏洞时获得了高达95％的准确性。在本文中，我们询问：“基于最先进的DL技术在现实世界脆弱性预测方案中的表现如何？”。令我们惊讶的是，我们发现他们的性能下降了50％以上。对导致这种急剧性能下降的原因的系统研究表明，现有的基于DL的脆弱性预测方法在培训数据（例如，数据重复，脆弱类的不现实分布等）以及模型选择（例如，基于代价的模型）中面临的挑战（例如，数据重复，不切实际的分布）。结果，这些方法通常不会学习与漏洞实际原因有关的特征。相反，他们从数据集中学习无关的工件（例如，特定的变量/函数名称等）。利用这些经验发现，我们演示了基于脆弱性预测的现实设置的更有原则的数据收集和模型设计方法如何导致更好的解决方案。与所研究的基线相比，所得工具的性能要比所研究的基线要好得多：与文献中最佳性能模型相比，召回率的提升高达33.57％，召回率提高了128.38％。总体而言，本文阐明了现有的基于DL的漏洞预测系统的潜在问题，并为未来的基于DL的脆弱性预测研究绘制了路线图。本着这种精神，我们提供了支持我们结果的所有工件：https：//git.io/jf6ia。

Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy of up to 95% at detecting vulnerabilities. In this paper, we ask, "how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?". To our surprise, we find that their performance drops by more than 50%. A systematic investigation of what causes such precipitous performance drop reveals that existing DL-based vulnerability prediction approaches suffer from challenges with the training data (e.g., data duplication, unrealistic distribution of vulnerable classes, etc.) and with the model choices (e.g., simple token-based models). As a result, these approaches often do not learn features related to the actual cause of the vulnerabilities. Instead, they learn unrelated artifacts from the dataset (e.g., specific variable/function names, etc.). Leveraging these empirical findings, we demonstrate how a more principled approach to data collection and model design, based on realistic settings of vulnerability prediction, can lead to better solutions. The resulting tools perform significantly better than the studied baseline: up to 33.57% boost in precision and 128.38% boost in recall compared to the best performing model in the literature. Overall, this paper elucidates existing DL-based vulnerability prediction systems' potential issues and draws a roadmap for future DL-based vulnerability prediction research. In that spirit, we make available all the artifacts supporting our results: https://git.io/Jf6IA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题