论文标题

基于图形社区给出的拓扑的灵活离群探测器

A flexible outlier detector based on a topology given by graph communities

论文作者

Terrades, O. Ramos, Berenguel, A., Gil, D.

论文摘要

异常值或异常检测对于机器学习方法和统计预测模型的最佳性能至关重要。这不仅是数据清洁过程中的技术步骤,而且是许多领域的关键主题,例如欺诈性文档检测,医疗应用和辅助诊断系统或检测安全威胁。与基于人群的方法相反,基于邻里的本地方法是简单的灵活方法,具有在小样本量不平衡问题中表现良好的潜力。但是,本地方法的主要关注点是每个样本社区的计算对方法性能的影响。大多数方法在特征空间中使用距离来定义一个需要仔细选择几个参数的单个社区。这项工作基于局部方法,该方法是根据被视为拓扑歧管的特征空间中样本标签异质性的局部度量的。拓扑是使用加权图的群落计算的,该群落将特征空间中的相互近距邻居编码编纂。这样,我们提供了一组能够描述复杂空间的结构而无需进行参数调整的多个社区。现实世界数据集的广泛实验表明,我们的方法总体表现优于多和单视图设置中的本地和全球策略。

Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as fraudulent document detection, in medical applications and assisted diagnosis systems or detecting security threats. In contrast to population-based methods, neighborhood based local approaches are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. However, a main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world data sets show that our approach overall outperforms, both, local and global strategies in multi and single view settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源