论文标题

肯德尔相互作用过滤器,用于在超高维分类问题中进行可变相互作用筛选

The Kendall Interaction Filter for Variable Interaction Screening in Ultra High Dimensional Classification Problems

论文作者

Anzarmou, Youssef, Mkhadri, Abdallah, Oualkacha, Karim

论文摘要

考虑重要的相互作用效应可以改善许多统计学习模型的预测。但是,由于其超高维性质,对相关互动的识别是一个具有挑战性的问题。互动筛选策略可以减轻此类问题。但是,由于较重的尾巴分布和相互作用效应的复杂依赖性结构,需要进行创新的鲁棒和/或无模型的筛选相互作用方法,以更好地对复杂和高通量数据进行更好的比例分析。在这项工作中,我们开发了一种新的无模型交互筛选方法,称为Kendall交互过滤器(KIF),用于在高维设置中进行分类。 KIF方法提出了一种加权和度量,将整体量与集群内肯德尔(Kendall)的$τ$成对的预测变量进行了比较,以选择特征的交互式伴侣。所提出的KIF测量捕获了簇响应变量,处理连续,分类或连续类别特征的混合物的相关相互作用,并且在单调转换下是不变的。我们表明,KIF测量在温和条件下享有高维环境中的确定筛选属性,而不会对特征的分布施加次指数力矩假设。我们说明了使用仿真研究与同一类别中的方法相比,提出的方法的有利行为,并进行了实际数据分析以证明其效用。

Accounting for important interaction effects can improve prediction of many statistical learning models. Identification of relevant interactions, however, is a challenging issue owing to their ultrahigh-dimensional nature. Interaction screening strategies can alleviate such issues. However, due to heavier tail distribution and complex dependence structure of interaction effects, innovative robust and/or model-free methods for screening interactions are required to better scale analysis of complex and high-throughput data. In this work, we develop a new model-free interaction screening method, termed Kendall Interaction Filter (KIF), for the classification in high-dimensional settings. The KIF method suggests a weighted-sum measure, which compares the overall to the within-cluster Kendall's $τ$ of pairs of predictors, to select interactive couples of features. The proposed KIF measure captures relevant interactions for the clusters response-variable, handles continuous, categorical or a mixture of continuous-categorical features, and is invariant under monotonic transformations. We show that the KIF measure enjoys the sure screening property in the high-dimensional setting under mild conditions, without imposing sub-exponential moment assumptions on the features' distributions. We illustrate the favorable behavior of the proposed methodology compared to the methods in the same category using simulation studies, and we conduct real data analyses to demonstrate its utility.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源