森林R-CNN：大墨西哥卷务长尾的对象检测和实例分段

论文标题

森林R-CNN：大墨西哥卷务长尾的对象检测和实例分段

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation

论文作者

Wu, Jialian, Song, Liangchen, Wang, Tiancai, Zhang, Qian, Yuan, Junsong

论文摘要

尽管对象分析先前取得了成功，但使用长尾数据分布来检测和分割大量对象类别仍然是一个具有挑战性的问题，并且对研究的研究较少。对于大型摄取的分类器，获得嘈杂的逻辑的机会要高得多，这很容易导致错误的识别。在本文中，我们利用了对象类别之间关系的先验知识，将细颗粒类的类别聚类为更粗糙的父类，并构造一个分类树，该类别负责通过其父类别将对象实例解析为细粒类别。在分类树中，由于父类节点的数量明显较小，因此它们的逻辑噪音较小，可以用来抑制细粒类节点中存在错误/嘈杂的逻辑。由于构建母班的方式不是唯一的，因此我们进一步建造了多棵树，形成一个分类森林，每棵树都会为细粒度的分类做出投票。为了减轻长尾现象引起的不平衡学习，我们提出了一种简单而有效的重采样方法，NMS重新采样，以重新平衡数据分布。我们的方法称为森林R-CNN，可以用作插件模块，用于识别1000多个类别的大多数对象识别模型。在大型词汇数据集LVI上进行了广泛的实验。与面具R-CNN基线相比，森林R-CNN显着提高了性能，分别在稀有类别和整体类别上以11.5％和3.9％的AP提高。此外，我们在LVIS数据集上实现了最新结果。代码可在https://github.com/jialianw/forest_rcnn上找到。

Despite the previous success of object analysis, detecting and segmenting a large number of object categories with a long-tailed data distribution remains a challenging problem and is less investigated. For a large-vocabulary classifier, the chance of obtaining noisy logits is much higher, which can easily lead to a wrong recognition. In this paper, we exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes, and construct a classification tree that is responsible for parsing an object instance into a fine-grained category via its parent class. In the classification tree, as the number of parent class nodes are significantly less, their logits are less noisy and can be utilized to suppress the wrong/noisy logits existed in the fine-grained class nodes. As the way to construct the parent class is not unique, we further build multiple trees to form a classification forest where each tree contributes its vote to the fine-grained classification. To alleviate the imbalanced learning caused by the long-tail phenomena, we propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution. Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models for recognizing more than 1000 categories. Extensive experiments are performed on the large vocabulary dataset LVIS. Compared with the Mask R-CNN baseline, the Forest R-CNN significantly boosts the performance with 11.5% and 3.9% AP improvements on the rare categories and overall categories, respectively. Moreover, we achieve state-of-the-art results on the LVIS dataset. Code is available at https://github.com/JialianW/Forest_RCNN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题