论文标题
基于模型的探索深度学习系统测试的行为前沿
Model-based Exploration of the Frontier of Behaviours for Deep Learning System Testing
论文作者
论文摘要
随着对关键任务(例如自动驾驶)的深度学习(DL)的越来越多,对依赖DL的系统质量的评估变得至关重要。一旦训练,DL系统会为提供的任何任意数字向量作为输入而产生输出,而不管它是在测试系统的有效性域内还是之外。因此,此类系统的质量取决于其有效性域和其产出表现出不当行为的区域之间的交集。在本文中,我们介绍了行为前沿的概念,即DL系统开始表现不佳的输入。如果行为不端的边境不在系统的有效性域之外,则通过质量检查。否则,交叉点上的输入代表系统的质量缺陷。我们开发了DeepJanus,这是一种基于搜索的工具,可为DL系统生成前沿输入。对于自动驾驶汽车的车道保留组件获得的实验结果表明,训练有素的系统的前沿包含违反土木工程的最佳实践的几乎完全不切实际的道路,而训练有素的人的边界包括许多有效的投入,这些投入指向严重的系统缺陷。
With the increasing adoption of Deep Learning (DL) for critical tasks, such as autonomous driving, the evaluation of the quality of systems that rely on DL has become crucial. Once trained, DL systems produce an output for any arbitrary numeric vector provided as input, regardless of whether it is within or outside the validity domain of the system under test. Hence, the quality of such systems is determined by the intersection between their validity domain and the regions where their outputs exhibit a misbehaviour. In this paper, we introduce the notion of frontier of behaviours, i.e., the inputs at which the DL system starts to misbehave. If the frontier of misbehaviours is outside the validity domain of the system, the quality check is passed. Otherwise, the inputs at the intersection represent quality deficiencies of the system. We developed DeepJanus, a search-based tool that generates frontier inputs for DL systems. The experimental results obtained for the lane keeping component of a self-driving car show that the frontier of a well trained system contains almost exclusively unrealistic roads that violate the best practices of civil engineering, while the frontier of a poorly trained one includes many valid inputs that point to serious deficiencies of the system.