论文标题

基于自组织地图的两样本测试

Two-sample test based on Self-Organizing Maps

论文作者

Álvarez-Ayllón, Alejandro, Palomo-Duarte, Manuel, Dodero, Juan-Manuel

论文摘要

机器学习分类器可以作为两样本统计测试的利用。假设每个样本都分配了一个不同的标签,并且分类器可以获得比判断它们更好的结果。在这种情况下,我们可以推断两个样本源自不同人群。但是,许多类型的模型,例如神经网络,都是用户的黑框:他们可以拒绝这两个样本源于同一人群,但它们并没有洞悉两个样本如何不同。自组织地图是最初设计为一个数据可视化工具的维度降低,该工具显示出紧急属性,也可用于分类任务。由于它们可以用作分类器,因此也可以用作两样本统计测试。但是由于他们最初的目的是可视化,因此他们也可以提供见解。

Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reject that both samples originate from the same population, but they do not offer insight into how both samples differ. Self-Organizing Maps are a dimensionality reduction initially devised as a data visualization tool that displays emergent properties, being also useful for classification tasks. Since they can be used as classifiers, they can be used also as a two-sample statistical test. But since their original purpose is visualization, they can also offer insights.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源