更长时间观察更好：递归完善的注意力进行细粒度分类

论文标题

更长时间观察更好：递归完善的注意力进行细粒度分类

Focus Longer to See Better:Recursively Refined Attention for Fine-Grained Image Classification

论文作者

Shroff, Prateek, Chen, Tianlong, Wei, Yunchao, Wang, Zhangyang

论文摘要

深度神经网络在粗粒图像分类任务中表现出了很大的进步。这部分是由于它强大的能力从图像中提取歧视性特征表示。但是，细粒度图像中不同类别之间的边际视觉差异使这项任务变得更加困难。在本文中，我们试图专注于这些边际差异，以提取更多代表性的特征。与人类的视野类似，我们的网络重复着眼于图像的一部分，以发现班级之间的小歧视部分。此外，我们通过解释性技术展示了我们的网络重点从粗略细节变为细节。通过我们的实验，我们还表明，一个简单的注意模型可以汇总（加权）这些更细节的细节，以关注图像中最主要的歧视部分。我们的网络仅使用图像级标签，不需要边界框/零件注释信息。此外，我们网络的简单性使其成为一个简单的插头N播放模块。除了提供可解释性外，与基线相比，我们的网络还提高了性能（最高2％）。我们的代码库可从https://github.com/tamu-vita/focus-longer-to-see-better获得

Deep Neural Network has shown great strides in the coarse-grained image classification task. It was in part due to its strong ability to extract discriminative feature representations from the images. However, the marginal visual difference between different classes in fine-grained images makes this very task harder. In this paper, we tried to focus on these marginal differences to extract more representative features. Similar to human vision, our network repetitively focuses on parts of images to spot small discriminative parts among the classes. Moreover, we show through interpretability techniques how our network focus changes from coarse to fine details. Through our experiments, we also show that a simple attention model can aggregate (weighted) these finer details to focus on the most dominant discriminative part of the image. Our network uses only image-level labels and does not need bounding box/part annotation information. Further, the simplicity of our network makes it an easy plug-n-play module. Apart from providing interpretability, our network boosts the performance (up to 2%) when compared to its baseline counterparts. Our codebase is available at https://github.com/TAMU-VITA/Focus-Longer-to-See-Better

下载PDF全文

下载文献需遵守相关版权规定

论文标题