论文标题
Tensorfi:tensorflow应用程序的灵活故障注入框架
TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications
论文作者
论文摘要
随着机器学习(ML)在安全 - 关键领域(例如自动驾驶汽车)中的采用越来越高,ML系统的可靠性也变得重要。虽然先前的研究提出了拟议的技术来实现有效的错误释放能力技术(例如,选择性指导重复),但实现这些技术的基本要求是对应用程序弹性的详细了解。 在这项工作中,我们提出了Tensorfi,这是用于基于张量的应用程序的高级故障注入(FI)框架。 Tensorfi能够在一般TensorFlow程序中注入硬件和软件故障。 Tensorfi是一种可配置的FI工具,可灵活,易于使用且可移植。它可以集成到现有的张量程序中,以评估其对不同故障类型的弹性(例如,特定运算符的故障)。我们使用Tensorfi评估12个ML程序的弹性,包括自动驾驶域中使用的DNN。我们的工具可在https://github.com/deppedablesystemslab/tensorfi上公开获取。
As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems has also grown in importance. While prior studies have proposed techniques to enable efficient error-resilience techniques (e.g., selective instruction duplication), a fundamental requirement for realizing these techniques is a detailed understanding of the application's resilience. In this work, we present TensorFI, a high-level fault injection (FI) framework for TensorFlow-based applications. TensorFI is able to inject both hardware and software faults in general TensorFlow programs. TensorFI is a configurable FI tool that is flexible, easy to use, and portable. It can be integrated into existing TensorFlow programs to assess their resilience for different fault types (e.g., faults in particular operators). We use TensorFI to evaluate the resilience of 12 ML programs, including DNNs used in the autonomous vehicle domain. Our tool is publicly available at https://github.com/DependableSystemsLab/TensorFI.