论文标题
了解光谱聚类算法的概括性能
Understanding the Generalization Performance of Spectral Clustering Algorithms
论文作者
论文摘要
光谱聚类的理论分析主要集中于一致性,而对其泛化性能的研究相对较少。在本文中,我们研究了流行光谱聚类算法的多余风险范围:\ emph {leased} ratiocut and \ emph {sleased} ncut。首先,我们表明,它们在经验连续的最佳解决方案和总体级连续最佳解决方案之间具有$ \ Mathcal {O}(1/\ sqrt {n})$收敛速率,其中$ n $是样本大小。其次,我们显示了影响经验离散最佳解决方案与总体级离散最佳解决方案之间过度风险的基本数量。在经验水平上,可以设计算法来减少此数量。基于我们的理论分析,我们提出了两种新型算法,这些算法不仅可以惩罚此数量,而且可以将样本外数据集中在没有重新凝聚的整体样本上。实验验证了所提出的算法的有效性。
The theoretical analysis of spectral clustering mainly focuses on consistency, while there is relatively little research on its generalization performance. In this paper, we study the excess risk bounds of the popular spectral clustering algorithms: \emph{relaxed} RatioCut and \emph{relaxed} NCut. Firstly, we show that their excess risk bounds between the empirical continuous optimal solution and the population-level continuous optimal solution have a $\mathcal{O}(1/\sqrt{n})$ convergence rate, where $n$ is the sample size. Secondly, we show the fundamental quantity in influencing the excess risk between the empirical discrete optimal solution and the population-level discrete optimal solution. At the empirical level, algorithms can be designed to reduce this quantity. Based on our theoretical analysis, we propose two novel algorithms that can not only penalize this quantity, but also cluster the out-of-sample data without re-eigendecomposition on the overall sample. Experiments verify the effectiveness of the proposed algorithms.