迈向整体场景理解：语义细分及以后

论文标题

迈向整体场景理解：语义细分及以后

Towards holistic scene understanding: Semantic segmentation and beyond

论文作者

Meletis, Panagiotis

论文摘要

本文介绍了视觉场景的理解并增强细分性能和概括，网络的训练效率以及整体理解。首先，我们在街道场景的背景下调查语义细分，并在各种数据集的组合中训练语义细分网络。在第2章中，我们在单个卷积主链上设计了一个层次分类器的框架，并将其端到端训练以像素标记的数据集的结合，从而提高了通用性和可识别的语义概念的数量。第3章重点是通过弱监督来丰富语义细分，并提出了一种弱监督的算法，用于培训，以限制框级和图像级的监督，而不仅仅是每金像素的监督。在第4章中解决了由多个数据集的同时培训引起的记忆和计算负载挑战。我们提出了两种方法，用于从数据集中从数据集中选择信息性和不同样本的方法，以减少网络的生态足迹而不牺牲性能。在记忆和计算效率要求的激励中，在第5章中，我们重新考虑了异质数据集的同时培训，并提出了一个通用的语义分割框架。该框架通过利用各种场景理解数据集来实现性能指标和语义知识趋势的一致性。第6章介绍了部分感知的全景细分的新任务，该任务将我们的推理扩展到整体场景的理解。此任务将场景和零件级语义与实例级对象检测结合在一起。总之，我们的贡献涵盖了卷积网络体系结构，弱监督的学习，部分和全景分段，为整体，丰富和可持续的视觉场景理解铺平了道路。

This dissertation addresses visual scene understanding and enhances segmentation performance and generalization, training efficiency of networks, and holistic understanding. First, we investigate semantic segmentation in the context of street scenes and train semantic segmentation networks on combinations of various datasets. In Chapter 2 we design a framework of hierarchical classifiers over a single convolutional backbone, and train it end-to-end on a combination of pixel-labeled datasets, improving generalizability and the number of recognizable semantic concepts. Chapter 3 focuses on enriching semantic segmentation with weak supervision and proposes a weakly-supervised algorithm for training with bounding box-level and image-level supervision instead of only with per-pixel supervision. The memory and computational load challenges that arise from simultaneous training on multiple datasets are addressed in Chapter 4. We propose two methodologies for selecting informative and diverse samples from datasets with weak supervision to reduce our networks' ecological footprint without sacrificing performance. Motivated by memory and computation efficiency requirements, in Chapter 5, we rethink simultaneous training on heterogeneous datasets and propose a universal semantic segmentation framework. This framework achieves consistent increases in performance metrics and semantic knowledgeability by exploiting various scene understanding datasets. Chapter 6 introduces the novel task of part-aware panoptic segmentation, which extends our reasoning towards holistic scene understanding. This task combines scene and parts-level semantics with instance-level object detection. In conclusion, our contributions span over convolutional network architectures, weakly-supervised learning, part and panoptic segmentation, paving the way towards a holistic, rich, and sustainable visual scene understanding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题