论文标题
对自我表达的深度子空间聚类的批评
A Critique of Self-Expressive Deep Subspace Clustering
论文作者
论文摘要
子空间聚类是一种无监督的聚类技术,旨在用于在线性子空间结合使用的群集数据,每个子空间都定义了一个比环境空间低的群集。此问题的许多现有配方都是基于利用线性子空间的自我表达属性,在该子空间中的任何点可以表示为子空间中其他点的线性组合。为了将这种方法扩展到非线性歧管结合的数据,许多研究提出了使用神经网络对原始数据进行学习的嵌入,该神经网络由嵌入式空间中的数据上的自我表达损失函数正规化,以鼓励线性子空间的结合在嵌入式空间中的数据之前。在这里,我们表明,这种方法存在许多潜在的缺陷,在先前的工作中尚未充分解决。特别是,我们表明模型公式通常是不适合的,因为它可以导致数据的退化嵌入,这根本不需要与子空间的结合,并且不适合聚类。我们通过实验验证了理论结果,并在文献中重复进行了先前的实验,我们得出结论,以前声称的绩效益处的很大一部分可以归因于临时的后处理步骤,而不是深度子空间聚类模型。
Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces, with each subspace defining a cluster with dimension lower than the ambient space. Many existing formulations for this problem are based on exploiting the self-expressive property of linear subspaces, where any point within a subspace can be represented as linear combination of other points within the subspace. To extend this approach to data supported on a union of non-linear manifolds, numerous studies have proposed learning an embedding of the original data using a neural network which is regularized by a self-expressive loss function on the data in the embedded space to encourage a union of linear subspaces prior on the data in the embedded space. Here we show that there are a number of potential flaws with this approach which have not been adequately addressed in prior work. In particular, we show the model formulation is often ill-posed in that it can lead to a degenerate embedding of the data, which need not correspond to a union of subspaces at all and is poorly suited for clustering. We validate our theoretical results experimentally and also repeat prior experiments reported in the literature, where we conclude that a significant portion of the previously claimed performance benefits can be attributed to an ad-hoc post processing step rather than the deep subspace clustering model.