SDA- $ x $ net：自适应多尺度功能表示的选择性深度注意网络

论文标题

SDA- $ x $ net：自适应多尺度功能表示的选择性深度注意网络

SDA-$x$Net: Selective Depth Attention Networks for Adaptive Multi-scale Feature Representation

论文作者

Guo, Qingbei, Wu, Xiao-Jun, Feng, Zhiquan, Xu, Tianyang, Hu, Cong

论文摘要

现有的多尺度解决方案导致只有增加接受场大小的风险，同时忽略了小型接收场。因此，有效构建自适应神经网络以识别各种空间尺度对象是一个具有挑战性的问题。为了解决这个问题，我们首先引入了一个新的注意力维度，即除了频道，空间和分支等现有注意力方面，并提出了一个新颖的选择性深度注意网络，以对称地处理各种视觉任务中的多尺度对象。具体而言，在给定神经网络的每个阶段内的块，即重新连接，输出分层特征映射共享相同的分辨率但具有不同的接收场大小。基于此结构属性，我们设计了一个舞台的建筑模块，即SDA，其中包括树干分支和类似SE的注意力分支。树干分支的块输出被融合，以通过注意力分支指导其深度注意力分配。根据提出的注意机制，我们可以动态选择不同的深度特征，这有助于自适应调整可变大小输入对象的接收场大小。这样，跨块信息相互作用会导致沿深度方向的远距离依赖性。与其他多尺度方法相比，我们的SDA方法结合了从以前的块到舞台输出的多个接受场，从而提供了更广泛，更丰富的有效接受场。此外，我们的方法可以用作其他多尺度网络以及注意力网络的可插入模块，并创建为SDA- $ x $ net。它们的组合进一步扩展了有效的接受场的范围，向小型接受场延伸，从而实现了可解释的神经网络。我们的源代码可在\ url {https://github.com/qingbeiguo/sda-xnet.git}中获得。

Existing multi-scale solutions lead to a risk of just increasing the receptive field sizes while neglecting small receptive fields. Thus, it is a challenging problem to effectively construct adaptive neural networks for recognizing various spatial-scale objects. To tackle this issue, we first introduce a new attention dimension, i.e., depth, in addition to existing attention dimensions such as channel, spatial, and branch, and present a novel selective depth attention network to symmetrically handle multi-scale objects in various vision tasks. Specifically, the blocks within each stage of a given neural network, i.e., ResNet, output hierarchical feature maps sharing the same resolution but with different receptive field sizes. Based on this structural property, we design a stage-wise building module, namely SDA, which includes a trunk branch and a SE-like attention branch. The block outputs of the trunk branch are fused to globally guide their depth attention allocation through the attention branch. According to the proposed attention mechanism, we can dynamically select different depth features, which contributes to adaptively adjusting the receptive field sizes for the variable-sized input objects. In this way, the cross-block information interaction leads to a long-range dependency along the depth direction. Compared with other multi-scale approaches, our SDA method combines multiple receptive fields from previous blocks into the stage output, thus offering a wider and richer range of effective receptive fields. Moreover, our method can be served as a pluggable module to other multi-scale networks as well as attention networks, coined as SDA-$x$Net. Their combination further extends the range of the effective receptive fields towards small receptive fields, enabling interpretable neural networks. Our source code is available at \url{https://github.com/QingbeiGuo/SDA-xNet.git}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题