随机稀疏子空间群集

论文标题

随机稀疏子空间群集

Stochastic Sparse Subspace Clustering

论文作者

Chen, Ying, Li, Chun-Guang, You, Chong

论文摘要

最新的子空间聚类方法基于自我表达模型，该模型代表每个数据点作为其他数据点的线性组合。通过强制执行这种表示形式稀疏，可以保证稀疏的子空间聚类可以产生一个子空间保存数据亲和力，仅在同一子空间中连接两个点。但是，另一方面，来自同一子空间的数据点可能没有很好地连接，从而导致了过度分割的问题。我们介绍辍学以解决过度分割的问题，该问题基于自我表达模型中的随机辍学点。特别是，我们表明辍学等同于在表示系数上添加平方的$ \ ell_2 $ norm正则化，因此诱导了密度较差的解决方案。然后，我们将优化问题重新制定为一组小规模子问题的共识问题。这导致了一种可扩展且灵活的稀疏子空间聚类方法，称为随机稀疏子空间聚类，可以有效地处理大型数据集。关于综合数据和现实世界数据集的广泛实验验证了我们提案的效率和有效性。

State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points. By enforcing such representation to be sparse, sparse subspace clustering is guaranteed to produce a subspace-preserving data affinity where two points are connected only if they are from the same subspace. On the other hand, however, data points from the same subspace may not be well-connected, leading to the issue of over-segmentation. We introduce dropout to address the issue of over-segmentation, which is based on randomly dropping out data points in self-expressive model. In particular, we show that dropout is equivalent to adding a squared $\ell_2$ norm regularization on the representation coefficients, therefore induces denser solutions. Then, we reformulate the optimization problem as a consensus problem over a set of small-scale subproblems. This leads to a scalable and flexible sparse subspace clustering approach, termed Stochastic Sparse Subspace Clustering, which can effectively handle large scale datasets. Extensive experiments on synthetic data and real world datasets validate the efficiency and effectiveness of our proposal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题