用变压器端到端对象检测

论文标题

用变压器端到端对象检测

End-to-End Object Detection with Transformers

论文作者

Carion, Nicolas, Massa, Francisco, Synnaeve, Gabriel, Usunier, Nicolas, Kirillov, Alexander, Zagoruyko, Sergey

论文摘要

我们提出了一种新方法，将对象检测视为直接设置的预测问题。我们的方法简化了检测管道，有效地消除了对许多手工设计的组件的需求，例如非最大抑制过程或明确编码我们对任务的先验知识的锚定生成。新框架的主要成分（称为检测变压器或DETR）是基于集合的全局损失，它通过两部分匹配和变压器编码器decoder架构迫使唯一的预测。鉴于固定的一组学习对象查询，关于对象关系的DETR原因和全局图像上下文直接在并行输出最终预测集。与许多其他现代探测器不同，新模型在概念上很简单，不需要专门的库。 DETR在具有挑战性的可可对象检测数据集上表现出准确性和运行时性能与良好的且高度优化的RCNN基线。此外，可以轻松地将DITR概括为以统一的方式产生泛型分割。我们表明，它的表现大大优于竞争基线。培训代码和预估计的模型可在https://github.com/facebookresearch/detr上找到。

We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at https://github.com/facebookresearch/detr.

下载PDF全文

下载文献需遵守相关版权规定

论文标题