论文标题
未标记的数据引导的半监督组织病理学图像分割
Unlabeled Data Guided Semi-supervised Histopathology Image Segmentation
论文作者
论文摘要
自动组织病理学图像分割对于疾病分析至关重要。有限的可用标记数据阻碍了在完全监督的设置下受过训练的模型的普遍性。基于生成方法的半监督学习(SSL)已被证明可以有效利用各种图像特征。但是,尚未充分探索哪些类型的生成图像对于模型训练以及如何使用此类图像更有用。在本文中,我们通过利用未标记的数据分布提出了一种新的数据引导的组织病理学图像分割方法。首先,我们设计一个图像生成模块。图像内容和样式被解开并嵌入在群集友好的空间中,以利用其分布。新图像是通过采样和跨组合内容和样式合成的。其次,我们制定了一种有效的数据选择政策,以明智地采样生成的图像:(1)为了使生成的训练集更好地覆盖数据集,在原始培训集中代表不足的群集涵盖了更多; (2)为了使培训过程更有效,我们在数据可能稀缺的数据中识别并超过了“硬病例”的图像。我们的方法对腺体和核数据集进行了评估。我们表明,在归纳和转导设置下,我们的SSL方法始终提高常见分割模型的性能并获得最新的结果。
Automatic histopathology image segmentation is crucial to disease analysis. Limited available labeled data hinders the generalizability of trained models under the fully supervised setting. Semi-supervised learning (SSL) based on generative methods has been proven to be effective in utilizing diverse image characteristics. However, it has not been well explored what kinds of generated images would be more useful for model training and how to use such images. In this paper, we propose a new data guided generative method for histopathology image segmentation by leveraging the unlabeled data distributions. First, we design an image generation module. Image content and style are disentangled and embedded in a clustering-friendly space to utilize their distributions. New images are synthesized by sampling and cross-combining contents and styles. Second, we devise an effective data selection policy for judiciously sampling the generated images: (1) to make the generated training set better cover the dataset, the clusters that are underrepresented in the original training set are covered more; (2) to make the training process more effective, we identify and oversample the images of "hard cases" in the data for which annotated training data may be scarce. Our method is evaluated on glands and nuclei datasets. We show that under both the inductive and transductive settings, our SSL method consistently boosts the performance of common segmentation models and attains state-of-the-art results.