学会发现多标签图像识别的多级注意区域

论文标题

学会发现多标签图像识别的多级注意区域

Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition

论文作者

Gao, Bin-Bin, Zhou, Hong-Yu

论文摘要

与单标签图像分类相比，多标签图像识别是一项实用且具有挑战性的任务。但是，由于大量对象建议或复杂的注意区域生成模块，因此以前的工作可能是次优的。在本文中，我们提出了一个简单但有效的两流框架，以识别从全局图像到本地区域的多类别对象，类似于人类的感知对象。为了弥合全球和本地流之间的差距，我们提出了一个多级注意区域模块，该模块旨在使注意区域的数量尽可能小，并使这些地区的多样性尽可能高。我们的方法可以有效地有效地识别具有负担得起的计算成本和无参数区域定位模块的多类对象。在多标签图像分类的三个基准测试中，我们仅使用无标记依赖性的图像语义创建新的最新结果。此外，在不同因素（例如全球合并策略，输入大小和网络体系结构）中广泛证明了所提出方法的有效性。代码已在〜\ url {https://github.com/gaobb/mcar}上提供。

Multi-label image recognition is a practical and challenging task compared to single-label image classification. However, previous works may be suboptimal because of a great number of object proposals or complex attentional region generation modules. In this paper, we propose a simple but efficient two-stream framework to recognize multi-category objects from global image to local regions, similar to how human beings perceive objects. To bridge the gap between global and local streams, we propose a multi-class attentional region module which aims to make the number of attentional regions as small as possible and keep the diversity of these regions as high as possible. Our method can efficiently and effectively recognize multi-class objects with an affordable computation cost and a parameter-free region localization module. Over three benchmarks on multi-label image classification, we create new state-of-the-art results with a single model only using image semantics without label dependency. In addition, the effectiveness of the proposed method is extensively demonstrated under different factors such as global pooling strategy, input size and network architecture. Code has been made available at~\url{https://github.com/gaobb/MCAR}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题