具有类依赖性嵌入的跨版本缺陷预测

论文标题

具有类依赖性嵌入的跨版本缺陷预测

Cross Version Defect Prediction with Class Dependency Embeddings

论文作者

Cohen, Moti, Rokach, Lior, Puzis, Rami

论文摘要

软件缺陷预测旨在预测哪些软件模块最可能包含缺陷。这种方法背后的想法是通过帮助尽早找到错误来节省时间。缺陷预测模型基于历史数据。具体而言，可以使用从过去的软件分布或版本中收集的数据分析的数据。基于过去版本的缺陷预测称为跨版本缺陷预测（CVDP）。传统上，静态代码指标用于预测缺陷。在这项工作中，我们将类依赖性网络（CDN）用作缺陷的另一个预测指标，并结合静态代码指标。 CDN数据包含有关要分析目标应用程序的结构信息。通常，使用不同的手工网络度量（例如社交网络指标）对CDN数据进行分析。我们的方法使用网络嵌入技术来利用CDN信息，而无需手动构建指标。为了使用版本之间的嵌入，我们结合了不同的嵌入对齐技术。为了评估我们的方法，我们对24个软件发行对进行了实验，并将其与多种基准方法进行了比较。在这些实验中，我们分析了两种不同的图形嵌入技术的性能，三种锚定选择方法和两种比对技术。我们还基于两个不同的嵌入构建了一个元模型，并在基线方法上实现了4.7％（p <0.002）的统计学显着改善。

Software Defect Prediction aims at predicting which software modules are the most probable to contain defects. The idea behind this approach is to save time during the development process by helping find bugs early. Defect Prediction models are based on historical data. Specifically, one can use data collected from past software distributions, or Versions, of the same target application under analysis. Defect Prediction based on past versions is called Cross Version Defect Prediction (CVDP). Traditionally, Static Code Metrics are used to predict defects. In this work, we use the Class Dependency Network (CDN) as another predictor for defects, combined with static code metrics. CDN data contains structural information about the target application being analyzed. Usually, CDN data is analyzed using different handcrafted network measures, like Social Network metrics. Our approach uses network embedding techniques to leverage CDN information without having to build the metrics manually. In order to use the embeddings between versions, we incorporate different embedding alignment techniques. To evaluate our approach, we performed experiments on 24 software release pairs and compared it against several benchmark methods. In these experiments, we analyzed the performance of two different graph embedding techniques, three anchor selection approaches, and two alignment techniques. We also built a meta-model based on two different embeddings and achieved a statistically significant improvement in AUC of 4.7% (p < 0.002) over the baseline method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题