密集的交叉和支持注意力加权面膜聚合，以进行几次分段

论文标题

密集的交叉和支持注意力加权面膜聚合，以进行几次分段

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

论文作者

Shi, Xinyu, Wei, Dong, Zhang, Yu, Lu, Donghuan, Ning, Munan, Chen, Jiashun, Ma, Kai, Zheng, Yefeng

论文摘要

对几个射击语义细分（FSS）的研究引起了极大的关注，目的是在查询图像中仅给出目标类别的几个带注释的支持图像中的目标对象。这项具有挑战性的任务的关键是通过利用查询和支持图像之间的细粒度相关性来充分利用支持图像中的信息。但是，大多数现有方法要么将支持信息压缩为几个班级原型，要么在像素级别上使用的部分支持信息（例如，唯一的前景），从而导致不可忽略的信息损失。在本文中，我们提出了密集的像素，互源和支持的关注加权掩码聚合（DCAMA），其中前景和背景支持信息都是通过配对查询和支持特征之间的多级像素的相关性通过多级像素的相关性充分利用的。 DCAMA在变压器体系结构中以缩放的点产生关注实现，将每个查询像素视为令牌，计算其与所有支持像素的相似性，并预测其分割标签是所有支持像素标签的添加剂聚集 - 由相似之处加权。基于DCAMA的唯一公式，我们进一步提出了对N-shot分割的有效有效的一通推断，其中所有支持图像的像素都会一次为掩模聚集收集。实验表明，我们的DCAMA在Pascal-5i，可可-20i和FSS-1000的标准FSS基准上显着提高了最新技术，例如，在先前的最佳记录中，1-Shot MIOU的绝对改进为3.1％，9.7％和3.6％。烧蚀研究还验证了设计DCAMA。

Research into Few-shot Semantic Segmentation (FSS) has attracted great attention, with the goal to segment target objects in a query image given only a few annotated support images of the target class. A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images. However, most existing approaches either compressed the support information into a few class-wise prototypes, or used partial support information (e.g., only foreground) at the pixel level, causing non-negligible information loss. In this paper, we propose Dense pixel-wise Cross-query-and-support Attention weighted Mask Aggregation (DCAMA), where both foreground and background support information are fully exploited via multi-level pixel-wise correlations between paired query and support features. Implemented with the scaled dot-product attention in the Transformer architecture, DCAMA treats every query pixel as a token, computes its similarities with all support pixels, and predicts its segmentation label as an additive aggregation of all the support pixels' labels -- weighted by the similarities. Based on the unique formulation of DCAMA, we further propose efficient and effective one-pass inference for n-shot segmentation, where pixels of all support images are collected for the mask aggregation at once. Experiments show that our DCAMA significantly advances the state of the art on standard FSS benchmarks of PASCAL-5i, COCO-20i, and FSS-1000, e.g., with 3.1%, 9.7%, and 3.6% absolute improvements in 1-shot mIoU over previous best records. Ablative studies also verify the design DCAMA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题