背景自适应更快的R-CNN，用于半监督的卷积对象检测X射线图像中威胁

论文标题

背景自适应更快的R-CNN，用于半监督的卷积对象检测X射线图像中威胁

Background Adaptive Faster R-CNN for Semi-Supervised Convolutional Object Detection of Threats in X-Ray Images

论文作者

Sigman, John B., Spell, Gregory P., Liang, Kevin J, Carin, Lawrence

论文摘要

最近，在使用X射线图像的随身行李中，对卷积对象探测器的监督培训（例如，R-CNN）的监督培训已经取得了进展。这是运输安全管理局（TSA）保护美国航空旅行者的任务的一部分。虽然更多具有威胁的培训数据可以可靠地改善此类深层算法的性能，但在现实情况下上台却很昂贵。相比之下，可以以最低的成本快速收集来自现实世界的数据。在本文中，我们提出了一种半监督的威胁识别方法，我们称之为背景自适应速度更快。这种方法是针对两阶段对象检测器的训练方法，该方法使用深度学习领域的域适应方法。前面描述的数据源创造了两个“域”：具有威胁的图像的手工收集的数据域，以及假定没有威胁的图像的现实世界域。对对象提案进行区分的两个域歧视者，一个用于图像特征，是对对抗训练的，以防止编码域特定的信息。如果没有这种惩罚，卷积神经网络（CNN）可以学会根据表面特征识别域，并最大程度地减少监督损失功能而不提高其识别对象的能力。对于手工收集的数据，仅使用背景的对象建议和图像特征。这些领域自适应歧视因子的损失被添加到来自两个域的图像的更快的R-CNN损失中。这可以通过将手工收集背景提取的功能的统计数据与现实世界数据匹配，从而降低威胁检测错误警报率。在两个独立收集的标记威胁数据集中证明了绩效的改进。

Recently, progress has been made in the supervised training of Convolutional Object Detectors (e.g. Faster R-CNN) for threat recognition in carry-on luggage using X-ray images. This is part of the Transportation Security Administration's (TSA's) mission to protect air travelers in the United States. While more training data with threats may reliably improve performance for this class of deep algorithm, it is expensive to stage in realistic contexts. By contrast, data from the real world can be collected quickly with minimal cost. In this paper, we present a semi-supervised approach for threat recognition which we call Background Adaptive Faster R-CNN. This approach is a training method for two-stage object detectors which uses Domain Adaptation methods from the field of deep learning. The data sources described earlier make two "domains": a hand-collected data domain of images with threats, and a real-world domain of images assumed without threats. Two domain discriminators, one for discriminating object proposals and one for image features, are adversarially trained to prevent encoding domain-specific information. Without this penalty a Convolutional Neural Network (CNN) can learn to identify domains based on superficial characteristics, and minimize a supervised loss function without improving its ability to recognize objects. For the hand-collected data, only object proposals and image features from backgrounds are used. The losses for these domain-adaptive discriminators are added to the Faster R-CNN losses of images from both domains. This can reduce threat detection false alarm rates by matching the statistics of extracted features from hand-collected backgrounds to real world data. Performance improvements are demonstrated on two independently-collected datasets of labeled threats.

下载PDF全文

下载文献需遵守相关版权规定

论文标题