论文标题
分布式主成分分析的强大协方差估计
Robust covariance estimation for distributed principal component analysis
论文作者
论文摘要
Fan等。 [$ \ MATHIT {ANNALS} $ $ \ MATHIT {of} $ $ \ MATHIT {statistics} $ $ \ textbf {47} $(6)(2019)3009-3031]构建了分布式的主体组件分析(PCA)算法,以降低多个服务器之间的通信成本。但是,他们的算法保证仅用于次高斯数据。由于这种不足的影响,本文通过利用Minsker的强大协方差矩阵估计器[$ \ MATHIT {ANNALS} $ $ $ $ $ \ MATHIT {of} $ $ $ \ MATHIT {statistics} $ \ $ \ $ \ $ \ $ \ \ textbf {46} $(6A)(2010)(and)(2010年)(and)(2010年)(and)(2010年), [$ \ MATHIT {统计} $ $ $ \ Mathit {Science} $ $ \ textbf {34} $(3)(2019)454-471]驯服重型数据。理论结果表明,当采样分布是对称创新的,有限的第四刻或有限的$ 6 $ 3矩时,可鲁棒算法产生的最终估计器的统计错误率与次高斯尾巴的统计错误率相似。广泛的数值试验支持理论分析,并表明我们的算法对重尾数据和异常值是强大的。
Fan et al. [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{47}$(6) (2019) 3009-3031] constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm's guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{46}$(6A) (2018) 2871-2903] and Ke et al. [$\mathit{Statistical}$ $\mathit{Science}$ $\textbf{34}$(3) (2019) 454-471] to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite $6$-th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.