自上而下的注意

论文标题

自上而下的注意

Unsupervised Foveal Vision Neural Networks with Top-Down Attention

论文作者

Burt, Ryan, Thigpen, Nina N., Keil, Andreas, Principe, Jose C.

论文摘要

深度学习体系结构是识别和分类图像的极其强大的工具。但是，他们需要监督的学习，通常在媒介上工作的图像像素的大小，并在接受数百万个对象图像训练时产生最佳结果。为了减轻这些问题，我们提出了仅使用无监督的学习技术的自下而上显着性和自上而下的注意力的融合，这有助于对象识别模块专注于相关数据并学习重要的功能，以后可以对特定任务进行细心调整这些功能。此外，通过仅利用数据的相关部分，可以大大提高训练速度。我们测试了在多伦多和CAT2000数据库中提出的伽马显着性技术的性能，以及在Street View House Number（SVHN）数据库中的视野。 FoVeated Vision的结果表明，伽马显着性与最佳和计算上的速度相当。 SVHN的结果表明，我们无监督的认知结构与完全监督的方法相当，并且伽马显着性也可以根据需要提高CNN性能。我们还基于应用于CNN顶层的伽马显着性，以提高多对象图像或具有强背景混乱的图像中的场景理解，从而开发了一个下层注意机制。当我们将结果与人类观察者进行比较时，在自然场景中遮挡的动物的图像数据集中，我们表明，上层注意力可以使对象与背景歧义，并改善了超出人类观察者水平的系统性能。

Deep learning architectures are an extremely powerful tool for recognizing and classifying images. However, they require supervised learning and normally work on vectors the size of image pixels and produce the best results when trained on millions of object images. To help mitigate these issues, we propose the fusion of bottom-up saliency and top-down attention employing only unsupervised learning techniques, which helps the object recognition module to focus on relevant data and learn important features that can later be fine-tuned for a specific task. In addition, by utilizing only relevant portions of the data, the training speed can be greatly improved. We test the performance of the proposed Gamma saliency technique on the Toronto and CAT2000 databases, and the foveated vision in the Street View House Numbers (SVHN) database. The results in foveated vision show that Gamma saliency is comparable to the best and computationally faster. The results in SVHN show that our unsupervised cognitive architecture is comparable to fully supervised methods and that the Gamma saliency also improves CNN performance if desired. We also develop a topdown attention mechanism based on the Gamma saliency applied to the top layer of CNNs to improve scene understanding in multi-object images or images with strong background clutter. When we compare the results with human observers in an image dataset of animals occluded in natural scenes, we show that topdown attention is capable of disambiguating object from background and improves system performance beyond the level of human observers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题