论文标题

使用随机预测的高维离群值检测

High-dimensional outlier detection using random projections

论文作者

Navarro-Esteban, P., Cuesta-Albertos, J. A.

论文摘要

存在多种方法来检测文献中多元数据中的异常值,但其中大多数需要估计协方差矩阵。尺寸越高,在高维度中,矩阵的估计越复杂。为了避免估计此矩阵,我们提出了一种基于随机投影的新型程序,以检测高斯多元数据中的异常值。它包括在几个一维子空间中投射数据,其中适当的单变量离群检测方法(类似于Tukey的方法,但阈值取决于初始维度和样本大小)。使用顺序分析确定所需的投影数。模拟和真实数据集说明了提出的方法的性能。

There exist multiple methods to detect outliers in multivariate data in the literature, but most of them require to estimate the covariance matrix. The higher the dimension, the more complex the estimation of the matrix becoming impossible in high dimensions. In order to avoid estimating this matrix, we propose a novel random projections-based procedure to detect outliers in Gaussian multivariate data. It consists in projecting the data in several one-dimensional subspaces where an appropriate univariate outlier detection method, similar to Tukey's method but with a threshold depending on the initial dimension and the sample size, is applied. The required number of projections is determined using sequential analysis. Simulated and real datasets illustrate the performance of the proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源