论文标题
深神经网络中突变测试的概率框架
A Probabilistic Framework for Mutation Testing in Deep Neural Networks
论文作者
论文摘要
上下文:突变测试(MT)是传统软件工程(SE)白盒测试的重要工具。它旨在在系统中人为地注入故障,以评估测试套件检测它们的能力,假设检测套件缺陷找到能力将转化为实际故障。如果长期以来在SE中使用了MT,那么直到最近它才开始引起深度学习(DL)社区的注意,研究人员将其调整以提高DL模型的可检验性并提高DL系统的可信度。 目的:如果为MT提出了几种技术,则大多数技术忽略了训练阶段固有的DL的随机性。即使是DL中最新的MT方法也建议通过统计方法解决MT,也可能会带来不一致的结果。确实,由于它们的统计数据基于固定的采样培训实例,因此在任何情况下结果一致时,它可能会导致在设置中的不同结果。 方法:在这项工作中,我们提出了一种概率突变测试(PMT)方法,以减轻不一致的问题,并就突变体是否被杀死做出更一致的决定。 结果:我们表明,PMT有效地通过使用三个模型和八个突变算子在先前提出的MT方法中评估来有效地对突变做出更一致和知情的决定。我们还分析了近似错误与方法成本之间的权衡,这表明可以以可管理的成本实现相对较小的错误。 结论:我们的结果表明,DNN当前的MT实践的局限性以及重新考虑它们的必要性。我们认为,PMT是朝这个方向迈出的第一步,它有效地消除了由DNN训练的随机性引起的先前方法的测试执行的一致性。
Context: Mutation Testing (MT) is an important tool in traditional Software Engineering (SE) white-box testing. It aims to artificially inject faults in a system to evaluate a test suite's capability to detect them, assuming that the test suite defects finding capability will then translate to real faults. If MT has long been used in SE, it is only recently that it started gaining the attention of the Deep Learning (DL) community, with researchers adapting it to improve the testability of DL models and improve the trustworthiness of DL systems. Objective: If several techniques have been proposed for MT, most of them neglected the stochasticity inherent to DL resulting from the training phase. Even the latest MT approaches in DL, which propose to tackle MT through a statistical approach, might give inconsistent results. Indeed, as their statistic is based on a fixed set of sampled training instances, it can lead to different results across instances set when results should be consistent for any instance. Methods: In this work, we propose a Probabilistic Mutation Testing (PMT) approach that alleviates the inconsistency problem and allows for a more consistent decision on whether a mutant is killed or not. Results: We show that PMT effectively allows a more consistent and informed decision on mutations through evaluation using three models and eight mutation operators used in previously proposed MT methods. We also analyze the trade-off between the approximation error and the cost of our method, showing that relatively small error can be achieved for a manageable cost. Conclusion: Our results showed the limitation of current MT practices in DNN and the need to rethink them. We believe PMT is the first step in that direction which effectively removes the lack of consistency across test executions of previous methods caused by the stochasticity of DNN training.