论文标题
Autoembedder:用于聚类的半监督DNN嵌入系统
AutoEmbedder: A semi-supervised DNN embedding system for clustering
论文作者
论文摘要
聚类广泛用于处理未标记数据的无监督学习方法。深度聚类已成为一个流行的研究领域,将聚类与深神经网络(DNN)架构相关联。深群集方法下调了高维数据,这也可能与聚类损失有关。半监督学习(SSL)也引入了深层聚类。大多数SSL方法都取决于成对约束信息,该信息是一个矩阵,其中包含知识,是否可以在同一群集中进行数据对。本文介绍了一个名为Autoembedder的新型嵌入式系统,该系统将更高的尺寸数据下调至可簇的嵌入点。据我们所知,这是与传统的分类器DNN体系结构相关的第一次研究努力。训练过程是半监督的,并使用暹罗网络体系结构来计算特征学习阶段的成对约束损失。自动培训者的表现优于在著名数据集上测试的大多数现有基于DNN的半监督方法。
Clustering is widely used in unsupervised learning method that deals with unlabeled data. Deep clustering has become a popular study area that relates clustering with Deep Neural Network (DNN) architecture. Deep clustering method downsamples high dimensional data, which may also relate clustering loss. Deep clustering is also introduced in semi-supervised learning (SSL). Most SSL methods depend on pairwise constraint information, which is a matrix containing knowledge if data pairs can be in the same cluster or not. This paper introduces a novel embedding system named AutoEmbedder, that downsamples higher dimensional data to clusterable embedding points. To the best of our knowledge, this is the first research endeavor that relates to traditional classifier DNN architecture with a pairwise loss reduction technique. The training process is semi-supervised and uses Siamese network architecture to compute pairwise constraint loss in the feature learning phase. The AutoEmbedder outperforms most of the existing DNN based semi-supervised methods tested on famous datasets.