论文标题

LIBDB:检测二进制文件的第三方库的有效效率框架

LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries

论文作者

Tang, Wei, Wang, Yanlin, Zhang, Hongyu, Han, Shi, Luo, Ping, Zhang, Dongmei

论文摘要

第三方库(TPLS)在软件应用程序中经常重复使用,以降低开发成本。但是,他们也可以引入安全风险。已经提出了许多TPL检测方法来检测Android字节码或源代码中的TPL重复使用。本文着重于检测二进制代码中的TPL重复使用,这是一个更具挑战性的任务。对于以二进制形式的检测目标,可以将库并链接到单独的动态链接文件或内置到包含多个库和特定于项目的代码的融合二进制文件中。这可能会导致更少的可用代码功能,并降低功能工程的有效性。在本文中,我们提出了一个二进制TPL重用检测框架LIBDB,即使在被剥离和融合的二进制文件中,它也可以有效地检测导入的TPL。除了基本和粗粒的功能(字符串文字和导出的功能名称)外,LibDB还利用功能内容作为新型功能。它通过训练有素的神经网络将所有函数嵌入二进制文件中。它进一步采用了基于函数呼叫图的比较方法来提高检测的准确性。 LIBDB能够支持检测目标中包含的TPLS的版本识别,而现有检测方法不考虑。为了评估LIBDB的性能,我们为基于二进制的TPL重复使用检测构建了三个数据集。我们的实验结果表明,LIBDB比二进制TPL检测任务和版本识别任务的最先进工具更准确,更有效。我们在此工作中使用的数据集和源代码可匿名在https://github.com/deepsoftwareanalytics/libdb上获得。

Third-party libraries (TPLs) are reused frequently in software applications for reducing development cost. However, they could introduce security risks as well. Many TPL detection methods have been proposed to detect TPL reuse in Android bytecode or in source code. This paper focuses on detecting TPL reuse in binary code, which is a more challenging task. For a detection target in binary form, libraries may be compiled and linked to separate dynamic-link files or built into a fused binary that contains multiple libraries and project-specific code. This could result in fewer available code features and lower the effectiveness of feature engineering. In this paper, we propose a binary TPL reuse detection framework, LibDB, which can effectively and efficiently detect imported TPLs even in stripped and fused binaries. In addition to the basic and coarse-grained features (string literals and exported function names), LibDB utilizes function contents as a new type of feature. It embeds all functions in a binary file to low-dimensional representations with a trained neural network. It further adopts a function call graph-based comparison method to improve the accuracy of the detection. LibDB is able to support version identification of TPLs contained in the detection target, which is not considered by existing detection methods. To evaluate the performance of LibDB, we construct three datasets for binary-based TPL reuse detection. Our experimental results show that LibDB is more accurate and efficient than state-of-the-art tools on the binary TPL detection task and the version identification task. Our datasets and source code used in this work are anonymously available at https://github.com/DeepSoftwareAnalytics/LibDB.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源