segnext：重新思考语义分割的卷积注意力设计

论文标题

segnext：重新思考语义分割的卷积注意力设计

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

论文作者

Guo, Meng-Hao, Lu, Cheng-Ze, Hou, Qibin, Liu, Zhengning, Cheng, Ming-Ming, Hu, Shi-Min

论文摘要

我们提出Segnext，这是一种简单的卷积网络体系结构，用于语义分割。由于自我注意力在编码空间信息中的效率，基于变压器的最新模型已主导语义分割领域。在本文中，我们表明，卷积的关注是一种比变形金刚中的自我发挥机制的更有效的编码上下文信息的方法。通过重新检查成功分割模型所拥有的特征，我们发现了几个关键组件，从而导致分割模型的性能提高。这促使我们设计了一个新型的卷积注意网络，该网络使用廉价的卷积操作。如果没有铃铛和口哨声，我们的segnext显着提高了先前最先进的方法对流行基准测试的性能，包括ADE20K，CityScapes，Coco-stuff，Coco-stuff，Pascal VOC，Pascal Context和Isaid。值得注意的是，Segnext优于NAS-FPN的EfficeNET-L2效果，并在Pascal VOC 2012测试排行榜上仅使用1/10个参数在Pascal VOC 2012测试排行榜上达到90.6％。平均而言，与具有相同或更少计算的ADE20K数据集中的最新方法相比，Segnext的改进约为2.0％。代码可在https://github.com/uyzhang/jseg（jittor）和https://github.com/visual-cratch-network/segnext（pytorch）中获得。

We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of semantic segmentation due to the efficiency of self-attention in encoding spatial information. In this paper, we show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers. By re-examining the characteristics owned by successful segmentation models, we discover several key components leading to the performance improvement of segmentation models. This motivates us to design a novel convolutional attention network that uses cheap convolutional operations. Without bells and whistles, our SegNeXt significantly improves the performance of previous state-of-the-art methods on popular benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal Context, and iSAID. Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90.6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it. On average, SegNeXt achieves about 2.0% mIoU improvements compared to the state-of-the-art methods on the ADE20K datasets with the same or fewer computations. Code is available at https://github.com/uyzhang/JSeg (Jittor) and https://github.com/Visual-Attention-Network/SegNeXt (Pytorch).

下载PDF全文

下载文献需遵守相关版权规定

论文标题