行人检测：域的概括，CNN，变压器及其他

论文标题

行人检测：域的概括，CNN，变压器及其他

Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

论文作者

Hasan, Irtiza, Liao, Shengcai, Li, Jinpeng, Akram, Saad Ullah, Shao, Ling

论文摘要

行人检测是许多基于视觉应用程序的基石，从对象跟踪到视频监视以及最近的自动驾驶。随着对象检测深度学习的快速发展，在传统的单数据库训练和评估设置中，行人检测取得了很好的表现。但是，在这项关于可推广的行人探测器的研究中，我们表明，当前的行人探测器在跨数据库评估中处理的小域移动也很差。我们将有限的概括归因于两个主要因素，即方法和当前数据源。关于该方法，我们说明当前行人探测器的设计选择（例如锚定设置）中存在的偏见是导致有限概括的主要因素。大多数现代的行人探测器都是针对目标数据集量身定制的，它们确实在传统的单训练和测试管道中实现了高性能，但是在通过跨数据图评估进行评估时，性能降低。因此，由于其通用设计，一般对象检测器在跨数据库评估中的性能更好。至于数据，我们表明自动驾驶基准本质上是单调的，也就是说，在行人中，它们在场景和茂密的情况下并不多样化。因此，通过爬网（包含多种多样的场景）策划的基准是提供更强大表示形式的有效预训练来源。因此，我们提出了一种渐进的微调策略，以改善概括。可以在https://github.com/hasanirtiza/pedestron上访问代码和模型。

Pedestrian detection is the cornerstone of many vision based applications, starting from object tracking to video surveillance and more recently, autonomous driving. With the rapid development of deep learning in object detection, pedestrian detection has achieved very good performance in traditional single-dataset training and evaluation setting. However, in this study on generalizable pedestrian detectors, we show that, current pedestrian detectors poorly handle even small domain shifts in cross-dataset evaluation. We attribute the limited generalization to two main factors, the method and the current sources of data. Regarding the method, we illustrate that biasness present in the design choices (e.g anchor settings) of current pedestrian detectors are the main contributing factor to the limited generalization. Most modern pedestrian detectors are tailored towards target dataset, where they do achieve high performance in traditional single training and testing pipeline, but suffer a degrade in performance when evaluated through cross-dataset evaluation. Consequently, a general object detector performs better in cross-dataset evaluation compared with state of the art pedestrian detectors, due to its generic design. As for the data, we show that the autonomous driving benchmarks are monotonous in nature, that is, they are not diverse in scenarios and dense in pedestrians. Therefore, benchmarks curated by crawling the web (which contain diverse and dense scenarios), are an efficient source of pre-training for providing a more robust representation. Accordingly, we propose a progressive fine-tuning strategy which improves generalization. Code and models can accessed at https://github.com/hasanirtiza/Pedestron.

下载PDF全文

下载文献需遵守相关版权规定

论文标题