图像分割模型的弱监督预训练的解释合奏

论文标题

图像分割模型的弱监督预训练的解释合奏

Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training of Image Segmentation Models

论文作者

Li, Xuhong, Xiong, Haoyi, Liu, Yi, Zhou, Dingfu, Chen, Zeyu, Wang, Yaqing, Dou, Dejing

论文摘要

虽然预先训练的预训练网络已成为训练图像分割模型的流行方式，但这种用于图像分割的骨干网络经常使用图像分类源数据集（例如ImageNet）进行预训练。尽管图像分类数据集可以为骨干网络提供丰富的视觉特征和歧视能力，但它们无法以端到端的方式对目标模型（即骨干+分割模块）进行全面预训练。由于分类数据集中缺乏分割标签，因此在微调过程中，分割模块在微调过程中被随机初始化。在我们的工作中，我们提出了一种利用伪语义分割标签（PSSL）的方法，以启用基于分类数据集的图像分割模型的端到端预训练。 PSSL的启发是受到观察的启发，即通过CAM，Smoothgrad和Lime等解释算法获得的分类模型的解释结果将接近视觉对象的像素簇。具体而言，通过解释分类结果并汇总了从多个分类器查询的解释集合来降低单个模型引起的偏差，从而为每个图像获得PSSL。使用PSSL对于每个ImageNet的每个图像，该提出的方法利用加权分割学习过程来预先培训分割网络。 Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models, i.e., PSPNet-ResNet50, DeepLabV3-ResNet50, and OCRNet-HRNetW18, on a number of segmentation tasks, such as CamVid, VOC-A, VOC-C, ADE20K, and城市景观，有重大改进。源代码可在https://github.com/paddlepaddle/paddleseg上使用。

While fine-tuning pre-trained networks has become a popular way to train image segmentation models, such backbone networks for image segmentation are frequently pre-trained using image classification source datasets, e.g., ImageNet. Though image classification datasets could provide the backbone networks with rich visual features and discriminative ability, they are incapable of fully pre-training the target model (i.e., backbone+segmentation modules) in an end-to-end manner. The segmentation modules are left to random initialization in the fine-tuning process due to the lack of segmentation labels in classification datasets. In our work, we propose a method that leverages Pseudo Semantic Segmentation Labels (PSSL), to enable the end-to-end pre-training for image segmentation models based on classification datasets. PSSL was inspired by the observation that the explanation results of classification models, obtained through explanation algorithms such as CAM, SmoothGrad and LIME, would be close to the pixel clusters of visual objects. Specifically, PSSL is obtained for each image by interpreting the classification results and aggregating an ensemble of explanations queried from multiple classifiers to lower the bias caused by single models. With PSSL for every image of ImageNet, the proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse. Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models, i.e., PSPNet-ResNet50, DeepLabV3-ResNet50, and OCRNet-HRNetW18, on a number of segmentation tasks, such as CamVid, VOC-A, VOC-C, ADE20K, and CityScapes, with significant improvements. The source code is availabel at https://github.com/PaddlePaddle/PaddleSeg.

下载PDF全文

下载文献需遵守相关版权规定

论文标题