论文标题

通过多步马匹过渡概率选择无监督的特征选择

Unsupervised Feature Selection via Multi-step Markov Transition Probability

论文作者

Min, Yan, Ye, Mao, Tian, Liang, Jian, Yulin, Zhu, Ce, Yang, Shangming

论文摘要

特征选择是一种广泛使用的缩小技术,可以选择特征子集,因为其可解释性。已经提出了许多方法并取得了良好的结果,其中相邻数据点之间的关系主要涉及。但是,始终忽略了可能不相邻的数据对之间的可能关联。与以前的方法不同,我们提出了一种新颖且非常简单的方法,用于无监督的特征选择,称为MMFS(Markov Markov过渡概率用于特征选择)。这个想法是使用多步马尔可夫过渡概率来描述任何数据对之间的关​​系。从正面和负面角度分别采用了两种方式,以在特征选择后保持数据结构。从积极的角度来看,可以在一定数数量的步骤中达到的最大过渡概率用于描述两个点之间的关系。然后,选择可以保持紧凑数据结构的功能。从负面的角度来看,可以在一定数量的步骤中达到的最小过渡概率用于描述两个点之间的关系。相反,选择至少保持宽松数据结构的功能。而且这两种方式也可以合并。因此提出了三种算法。我们的主要贡献是一种新颖的特征部分方法,该方法使用多步过渡概率来表征数据结构,以及从正面和负面方面提出的三种算法以保持数据结构。将我们的方法的性能与八个现实世界数据集的最新方法进行了比较,并且实验结果表明,所提出的MMFS在无监督的特征选择中有效。

Feature selection is a widely used dimension reduction technique to select feature subsets because of its interpretability. Many methods have been proposed and achieved good results, in which the relationships between adjacent data points are mainly concerned. But the possible associations between data pairs that are may not adjacent are always neglected. Different from previous methods, we propose a novel and very simple approach for unsupervised feature selection, named MMFS (Multi-step Markov transition probability for Feature Selection). The idea is using multi-step Markov transition probability to describe the relation between any data pair. Two ways from the positive and negative viewpoints are employed respectively to keep the data structure after feature selection. From the positive viewpoint, the maximum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. Then, the features which can keep the compact data structure are selected. From the viewpoint of negative, the minimum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. On the contrary, the features that least maintain the loose data structure are selected. And the two ways can also be combined. Thus three algorithms are proposed. Our main contributions are a novel feature section approach which uses multi-step transition probability to characterize the data structure, and three algorithms proposed from the positive and negative aspects for keeping data structure. The performance of our approach is compared with the state-of-the-art methods on eight real-world data sets, and the experimental results show that the proposed MMFS is effective in unsupervised feature selection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源