论文标题
自上而下的网络:CNN的粗到最新重新构想
Top-Down Networks: A coarse-to-fine reimagination of CNNs
论文作者
论文摘要
生物视觉采用了从初始视觉检测和视觉场景显着特征的结合到相关刺激的增强和优先处理的粗略信息处理途径。相反,CNNS采用了精细的处理处理,从局部,边缘检测过滤器转变为更全局的滤波器,从而提取输入的抽象表示。在本文中,我们扭转了标准自下而上体系结构的特征提取部分,并将其倒置:我们提出了自上而下的网络。我们提出的粗到五个路径是通过模糊较高的频率信息并仅在后期恢复,它为引入高频噪声的对抗性攻击提供了防御线。此外,由于我们将图像分辨率通过深度提高,因此最终卷积层中特征图的高分辨率有助于网络决策过程的解释性。这有利于对象驱动的决策,而不是上下文驱动的决策,因此提供了更好的局部类激活图。本文为自上而下的分辨率处理适用于多个视觉任务的各种现有体系结构提供了经验证据。
Biological vision adopts a coarse-to-fine information processing pathway, from initial visual detection and binding of salient features of a visual scene, to the enhanced and preferential processing given relevant stimuli. On the contrary, CNNs employ a fine-to-coarse processing, moving from local, edge-detecting filters to more global ones extracting abstract representations of the input. In this paper we reverse the feature extraction part of standard bottom-up architectures and turn them upside-down: We propose top-down networks. Our proposed coarse-to-fine pathway, by blurring higher frequency information and restoring it only at later stages, offers a line of defence against adversarial attacks that introduce high frequency noise. Moreover, since we increase image resolution with depth, the high resolution of the feature map in the final convolutional layer contributes to the explainability of the network's decision making process. This favors object-driven decisions over context driven ones, and thus provides better localized class activation maps. This paper offers empirical evidence for the applicability of the top-down resolution processing to various existing architectures on multiple visual tasks.