论文标题

通过基于等级的相关系数进行特征筛选的注意事项

A note of feature screening via rank-based coefficient of correlation

论文作者

Chen, Li-Pang

论文摘要

特征筛选是有用的,并且很受欢迎,可在开发统计分析或构建统计模型之前检测超高数据的信息预测因子。尽管已经开发了大量功能筛选程序,但其中大多数限于检查连续或离散响应。此外,即使已经提出了许多无模型的特征筛选方法,但在这些方法中提出了其他假设,以确保其理论结果。为了解决这些困难并提供简单的实施,在本文中,我们扩展了Chatterjee(2020)提出的基于等级的相关系数,以制定功能筛选程序。我们表明,这个新的筛选标准能够处理持续和离散的响应。从理论上讲,确定筛选属性是为了证明所提出的方法的合理性。模拟研究表明,无论响应的分布如何,都可以成功检测具有非线性和振荡轨迹的预测因子。

Feature screening is useful and popular to detect informative predictors for ultrahigh-dimensional data before developing proceeding statistical analysis or constructing statistical models. While a large body of feature screening procedures has been developed, most of them are restricted on examining either continuous or discrete responses. Moreover, even though many model-free feature screening methods have been proposed, additional assumptions are imposed in those methods to ensure their theoretical results. To address those difficulties and provide simple implementation, in this paper we extend the rank-based coefficient of correlation proposed by Chatterjee (2020) to develop feature screening procedure. We show that this new screening criterion is able to deal with continuous and discrete responses. Theoretically, sure screening property is established to justify the proposed method. Simulation studies demonstrate that the predictors with nonlinear and oscillatory trajectory are successfully detected regardless of the distribution of the response.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源