论文标题

混合数据的因子分析用于异常检测

Factor Analysis of Mixed Data for Anomaly Detection

论文作者

Davidow, Matthew, Matteson, David S.

论文摘要

异常检测旨在确定偏离典型数据模式的观察结果。在实践中,异常观察可能对应于财务欺诈,健康风险或错误测量的数据。我们显示,通过首先嵌入数据然后评估异常评分方案,可以增强高维混合数据中的检测异常。我们专注于无监督的检测以及连续和分类(混合)可变情况。我们建议对混合数据进行异常检测的混合数据分析,即FAMDAD,以获取用于反对评分的连续嵌入。我们说明,在该空间的第一个和最后几个有序的维度上,异常是高度可分离的,并测试了该子空间内的各种异常评分实验。为模拟和真实数据集提供了结果,并且在这些不同情况下,提出的方法(FAMDAD)对于高维混合数据非常准确。

Anomaly detection aims to identify observations that deviate from the typical pattern of data. Anomalous observations may correspond to financial fraud, health risks, or incorrectly measured data in practice. We show detecting anomalies in high-dimensional mixed data is enhanced through first embedding the data then assessing an anomaly scoring scheme. We focus on unsupervised detection and the continuous and categorical (mixed) variable case. We propose a kurtosis-weighted Factor Analysis of Mixed Data for anomaly detection, FAMDAD, to obtain a continuous embedding for anomaly scoring. We illustrate that anomalies are highly separable in the first and last few ordered dimensions of this space, and test various anomaly scoring experiments within this subspace. Results are illustrated for both simulated and real datasets, and the proposed approach (FAMDAD) is highly accurate for high-dimensional mixed data throughout these diverse scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源