大规模的子空间聚类通过K因子化

论文标题

大规模的子空间聚类通过K因子化

Large-Scale Subspace Clustering via k-Factorization

论文作者

Fan, Jicong

论文摘要

子空间聚类（SC）的目的是将数据集中在低维子空间的结合中。通常，SC学习亲和力矩阵，然后执行光谱群集。这两个步骤都均均具有较高的时间和空间复杂性，这导致了聚集大数据集的困难。本文提出了一种用于大规模子空间聚类的称为K-Factorization子空间聚类（K-FSC）的方法。 K-FSC通过在矩阵分解模型中追求结构化的稀疏性直接将数据分解为K组。因此，K-FSC避免学习亲和力矩阵并执行特征值分解，并且在大型数据集上具有较低的（线性）时间和空间复杂性。本文从理论上证明了K-FSC模型的有效性。提出了具有收敛保证的有效算法来解决K-FSC的优化。此外，K-FSC能够处理稀疏的噪声，异常值和缺少数据，这些数据在实际应用中无处不在。本文还为K-FSC提供了在线扩展和样本外扩展，以处理流数据和群集任意大型数据集。大规模真实数据集的广泛实验表明，K-FSC及其扩展的表现优于亚面群集的最先进方法。

Subspace clustering (SC) aims to cluster data lying in a union of low-dimensional subspaces. Usually, SC learns an affinity matrix and then performs spectral clustering. Both steps suffer from high time and space complexity, which leads to difficulty in clustering large datasets. This paper presents a method called k-Factorization Subspace Clustering (k-FSC) for large-scale subspace clustering. K-FSC directly factorizes the data into k groups via pursuing structured sparsity in the matrix factorization model. Thus, k-FSC avoids learning affinity matrix and performing eigenvalue decomposition, and has low (linear) time and space complexity on large datasets. This paper proves the effectiveness of the k-FSC model theoretically. An efficient algorithm with convergence guarantee is proposed to solve the optimization of k-FSC. In addition, k-FSC is able to handle sparse noise, outliers, and missing data, which are pervasive in real applications. This paper also provides online extension and out-of-sample extension for k-FSC to handle streaming data and cluster arbitrarily large datasets. Extensive experiments on large-scale real datasets show that k-FSC and its extensions outperform state-of-the-art methods of subspace clustering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题