论文标题

深层层次的语义细分

Deep Hierarchical Semantic Segmentation

论文作者

Li, Liulei, Zhou, Tianfei, Wang, Wenguan, Li, Jianwu, Yang, Yi

论文摘要

人类能够在观察中识别结构化关系,从而使我们能够将复杂的场景分解为更简单的部分,并在多个层面上抽象视觉世界。但是,在当前语义分割文献中,人类感知的这种分层推理能力在很大程度上仍未得到探索。现有的工作通常意识到标签平坦,并专门针对每个像素的目标类预测目标类。在本文中,我们介绍了层次结构的语义分割(HSS),该分段(HSS)旨在根据类别层次结构的结构化,像素的视觉观察描述。我们设计了HSSN,这是一个通用的HSS框架,可以在此任务中解决两个关键问题:i)如何有效地将现有的层次结构分割网络适应HSS设置,ii)如何利用层次结构信息来正便HSS网络学习。为了解决i),HSSN将HSS直接作为像素多标签分类任务,仅将最小的体系结构更改为当前的分割模型。为了解决II),HSSN首先探索层次结构作为训练目标的固有特性,该目标执行细分预测以遵守层次结构。此外,HSSN凭借层次结构诱导的边缘约束,重塑了像素嵌入空间,以生成结构良好的像素表示并最终改善分割。我们对四个语义分割数据集(即Mapillary Vistas 2.0,CityScapes,Lip和Pascal-Part-Part)进行实验,具有不同的类层次结构,分割网络架构和骨干,显示了HSSN的概括和优势。

Humans are able to recognize structured relations in observation, allowing us to decompose complex scenes into simpler parts and abstract the visual world in multiple levels. However, such hierarchical reasoning ability of human perception remains largely unexplored in current literature of semantic segmentation. Existing work is often aware of flatten labels and predicts target classes exclusively for each pixel. In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy. We devise HSSN, a general HSS framework that tackles two critical issues in this task: i) how to efficiently adapt existing hierarchy-agnostic segmentation networks to the HSS setting, and ii) how to leverage the hierarchy information to regularize HSS network learning. To address i), HSSN directly casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. To solve ii), HSSN first explores inherent properties of the hierarchy as a training objective, which enforces segmentation predictions to obey the hierarchy structure. Further, with hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations and improve segmentation eventually. We conduct experiments on four semantic segmentation datasets (i.e., Mapillary Vistas 2.0, Cityscapes, LIP, and PASCAL-Person-Part), with different class hierarchies, segmentation network architectures and backbones, showing the generalization and superiority of HSSN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源