对自我表达的深度子空间聚类的批评

论文标题

对自我表达的深度子空间聚类的批评

A Critique of Self-Expressive Deep Subspace Clustering

论文作者

Haeffele, Benjamin D., You, Chong, Vidal, René

论文摘要

子空间聚类是一种无监督的聚类技术，旨在用于在线性子空间结合使用的群集数据，每个子空间都定义了一个比环境空间低的群集。此问题的许多现有配方都是基于利用线性子空间的自我表达属性，在该子空间中的任何点可以表示为子空间中其他点的线性组合。为了将这种方法扩展到非线性歧管结合的数据，许多研究提出了使用神经网络对原始数据进行学习的嵌入，该神经网络由嵌入式空间中的数据上的自我表达损失函数正规化，以鼓励线性子空间的结合在嵌入式空间中的数据之前。在这里，我们表明，这种方法存在许多潜在的缺陷，在先前的工作中尚未充分解决。特别是，我们表明模型公式通常是不适合的，因为它可以导致数据的退化嵌入，这根本不需要与子空间的结合，并且不适合聚类。我们通过实验验证了理论结果，并在文献中重复进行了先前的实验，我们得出结论，以前声称的绩效益处的很大一部分可以归因于临时的后处理步骤，而不是深度子空间聚类模型。

Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces, with each subspace defining a cluster with dimension lower than the ambient space. Many existing formulations for this problem are based on exploiting the self-expressive property of linear subspaces, where any point within a subspace can be represented as linear combination of other points within the subspace. To extend this approach to data supported on a union of non-linear manifolds, numerous studies have proposed learning an embedding of the original data using a neural network which is regularized by a self-expressive loss function on the data in the embedded space to encourage a union of linear subspaces prior on the data in the embedded space. Here we show that there are a number of potential flaws with this approach which have not been adequately addressed in prior work. In particular, we show the model formulation is often ill-posed in that it can lead to a degenerate embedding of the data, which need not correspond to a union of subspaces at all and is poorly suited for clustering. We validate our theoretical results experimentally and also repeat prior experiments reported in the literature, where we conclude that a significant portion of the previously claimed performance benefits can be attributed to an ad-hoc post processing step rather than the deep subspace clustering model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题