参加和细分：注意力指导主动语义细分

论文标题

参加和细分：注意力指导主动语义细分

Attend and Segment: Attention Guided Active Semantic Segmentation

论文作者

Seifi, Soroush, Tuytelaars, Tinne

论文摘要

在动态环境中，在试图解析视野/资源领域有限的特工无法完全观察场景。在这种情况下，公共语义分割体系结构的部署是不可行的。在本文中，我们提出了一种方法，以逐步分割一系列部分观察的场景。主要思想是通过参加最不确定的区域来完善代理商对环境的理解。我们的方法包括一种自我监管的注意机制和一种专门的体系结构，以维护和利用空间记忆图，以填补环境中看不见的区域。代理可以在依靠来自访问的地区的线索幻觉的同时选择并参加区域。通过仅处理图像像素的18％（10个类似视网膜的闪光），我们达到了CityScapes，Camvid和Kitti数据集的平均像素精度为78.1％，80.9％和76.5％。我们对瞥见的数量，输入图像大小和类似视网膜样的瞥见的有效性进行消融研究。我们将我们的方法与几个基线进行了比较，并表明通过在第一个时间步长访问场景的非常低的分辨率视图来实现最佳结果。

In a dynamic environment, an agent with a limited field of view/resource cannot fully observe the scene before attempting to parse it. The deployment of common semantic segmentation architectures is not feasible in such settings. In this paper we propose a method to gradually segment a scene given a sequence of partial observations. The main idea is to refine an agent's understanding of the environment by attending the areas it is most uncertain about. Our method includes a self-supervised attention mechanism and a specialized architecture to maintain and exploit spatial memory maps for filling-in the unseen areas in the environment. The agent can select and attend an area while relying on the cues coming from the visited areas to hallucinate the other parts. We reach a mean pixel-wise accuracy of 78.1%, 80.9% and 76.5% on CityScapes, CamVid, and Kitti datasets by processing only 18% of the image pixels (10 retina-like glimpses). We perform an ablation study on the number of glimpses, input image size and effectiveness of retina-like glimpses. We compare our method to several baselines and show that the optimal results are achieved by having access to a very low resolution view of the scene at the first timestep.

下载PDF全文

下载文献需遵守相关版权规定

论文标题