重要的硬件故障：理解和估计硬件故障对对象检测的安全性影响DNN

论文标题

重要的硬件故障：理解和估计硬件故障对对象检测的安全性影响DNN

Hardware faults that matter: Understanding and Estimating the safety impact of hardware faults on object detection DNNs

论文作者

Qutub, Syed, Geissler, Florian, Peng, Yang, Grafe, Ralf, Paulitsch, Michael, Hinz, Gereon, Knoll, Alois

论文摘要

对象检测神经网络模型需要在高度动态和安全至关重要的环境（例如自动驾驶或机器人技术）中可靠地执行。因此，在意外的硬件故障（例如软误差）下验证检测的鲁棒性至关重要，这些错误可能会影响系统感知模块。基于平均精度的标准指标会在对象级别而不是图像级别产生模型漏洞估计。正如我们在本文中所示，这并不能提供直观或代表性的指标，即由基础内存中的位倒换引起的无声数据损坏的安全相关影响，而是会导致典型的断层诱导危害的过高或低估。为了关注与安全相关的实时应用，我们提出了一个新的度量IVMOD（图像漏洞检测的漏洞指标），以根据错误的对象检测错误（FPS）或假阳性（FNS）对象进行量化脆弱性，并结合严格分析。对几种代表性对象检测模型的评估表明，即使是单个位翻转也可能导致严重的无声数据腐败事件，具有潜在的关键安全性含义，例如，（大于）生成的100 fps或最多可产生。 90％的真实阳性（TPS）在图像中丢失。此外，在单个卡住的情况下，可能会影响整个图像的整个序列，从而导致暂时持续的幽灵检测，这些检测可能会误认为实际对象（覆盖了大约83％的图像）。此外，场景中的实际物体被持续错过（最多可丢失了64％的TPS）。我们的工作建立了对此类关键工作负载与硬件故障的安全相关漏洞的详细了解。

Object detection neural network models need to perform reliably in highly dynamic and safety-critical environments like automated driving or robotics. Therefore, it is paramount to verify the robustness of the detection under unexpected hardware faults like soft errors that can impact a systems perception module. Standard metrics based on average precision produce model vulnerability estimates at the object level rather than at an image level. As we show in this paper, this does not provide an intuitive or representative indicator of the safety-related impact of silent data corruption caused by bit flips in the underlying memory but can lead to an over- or underestimation of typical fault-induced hazards. With an eye towards safety-related real-time applications, we propose a new metric IVMOD (Image-wise Vulnerability Metric for Object Detection) to quantify vulnerability based on an incorrect image-wise object detection due to false positive (FPs) or false negative (FNs) objects, combined with a severity analysis. The evaluation of several representative object detection models shows that even a single bit flip can lead to a severe silent data corruption event with potentially critical safety implications, with e.g., up to (much greater than) 100 FPs generated, or up to approx. 90% of true positives (TPs) are lost in an image. Furthermore, with a single stuck-at-1 fault, an entire sequence of images can be affected, causing temporally persistent ghost detections that can be mistaken for actual objects (covering up to approx. 83% of the image). Furthermore, actual objects in the scene are continuously missed (up to approx. 64% of TPs are lost). Our work establishes a detailed understanding of the safety-related vulnerability of such critical workloads against hardware faults.

下载PDF全文

下载文献需遵守相关版权规定

论文标题