论文标题
K文本,一种自我监督的硬群集深度学习算法,用于卫星图像分割
K-textures, a self-supervised hard clustering deep learning algorithm for satellite image segmentation
论文作者
论文摘要
深度学习的自我监督算法可以在固定数量的硬标签(例如K-均值算法)中细分图像并仅依靠深度学习技术。在这里,我们介绍了K文本算法,该算法为4频段图像(RGB-NIR)提供了$ k $类的4频段图像(RGB-nir)的分段。给出了其在高分辨率行星卫星图像上应用的示例。我们的算法表明,使用卷积神经网络(CNN)和梯度下降是可行的。该模型检测到该模型中表示为$ K $离散二进制蒙版及其关联的$ k $独立生成的纹理的$ K $硬聚类类,合并为原始图像的模拟。相似性损失是原始图像和模拟图像的特征之间的平方平方误差,均从Keras“ Imagenet”预验VGG-16模型的倒数第二个卷积块和用行星数据制成的自定义特征提取器中提取。 K文本模型的主要进步是:首先,使用梯度下降在模型内获得$ K $离散蒙版。该模型允许使用新方法使用硬乙状体激活函数来生成离散的二进制掩码。其次,它提供硬聚类类 - 每个像素只有一个类。最后,与K-均值相比,在这里,每个像素都被独立考虑,在这里,也考虑了上下文信息,并且每个类不仅与颜色通道中的相似值相关,而且与纹理相似。我们的方法旨在简化用于卫星图像分割的训练样品的生产,并且可以对K型架构进行调整,以支持不同数量的频段和更复杂的任务,例如对象自我分割。型号代码和权重可在https://doi.org/10.5281/zenodo.6359859获得
Deep learning self-supervised algorithms that can segment an image in a fixed number of hard labels such as the k-means algorithm and relying only on deep learning techniques are still lacking. Here, we introduce the k-textures algorithm which provides self-supervised segmentation of a 4-band image (RGB-NIR) for a $k$ number of classes. An example of its application on high resolution Planet satellite imagery is given. Our algorithm shows that discrete search is feasible using convolutional neural networks (CNN) and gradient descent. The model detects $k$ hard clustering classes represented in the model as $k$ discrete binary masks and their associated $k$ independently generated textures, that combined are a simulation of the original image. The similarity loss is the mean squared error between the features of the original and the simulated image, both extracted from the penultimate convolutional block of Keras 'imagenet' pretrained VGG-16 model and a custom feature extractor made with Planet data. The main advances of the k-textures model are: first, the $k$ discrete binary masks are obtained inside the model using gradient descent. The model allows for the generation of discrete binary masks using a novel method using a hard sigmoid activation function. Second, it provides hard clustering classes -- each pixels has only one class. Finally, in comparison to k-means, where each pixel is considered independently, here, contextual information is also considered and each class is not associated only to similar values in the color channels but also to a texture. Our approach is designed to ease the production of training samples for satellite image segmentation and the k-textures architecture could be adapted to support different number of bands and for more complex tasks, such as object self-segmentation. The model codes and weights are available at https://doi.org/10.5281/zenodo.6359859