论文标题
使用组合互动测试进行机器学习的系统培训和测试
Systematic Training and Testing for Machine Learning Using Combinatorial Interaction Testing
论文作者
论文摘要
本文展示了组合覆盖范围的系统使用,用于选择和表征机器学习模型的测试和训练集。提出的工作调整了组合互动测试,该测试已成功地利用了识别软件测试中的故障,以表征机器学习中使用的数据。 MNIST手写数字数据用于证明组合覆盖范围可用于选择强调机器学习模型性能的测试集,选择导致强大模型性能的训练集,并选择用于对新域进行微调模型的数据。因此,结果将组合覆盖范围视为用于机器学习的培训和测试的整体方法。与以前的工作相反,该工作集中在使用神经网络的内部覆盖范围上,本文考虑了对来自输入和输出的简单特征的覆盖范围。因此,本文解决了机器学习模型的测试和培训集供应商没有对模型本身具有知识产权的情况。最后,本文介绍了对组合覆盖范围的先前批评,并提供了反驳,该反驳提倡在机器学习应用中使用覆盖范围。
This paper demonstrates the systematic use of combinatorial coverage for selecting and characterizing test and training sets for machine learning models. The presented work adapts combinatorial interaction testing, which has been successfully leveraged in identifying faults in software testing, to characterize data used in machine learning. The MNIST hand-written digits data is used to demonstrate that combinatorial coverage can be used to select test sets that stress machine learning model performance, to select training sets that lead to robust model performance, and to select data for fine-tuning models to new domains. Thus, the results posit combinatorial coverage as a holistic approach to training and testing for machine learning. In contrast to prior work which has focused on the use of coverage in regard to the internal of neural networks, this paper considers coverage over simple features derived from inputs and outputs. Thus, this paper addresses the case where the supplier of test and training sets for machine learning models does not have intellectual property rights to the models themselves. Finally, the paper addresses prior criticism of combinatorial coverage and provides a rebuttal which advocates the use of coverage metrics in machine learning applications.