论文标题
Freetree:具有相关特征的高维纵向数据的基于树的方法
FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features
论文作者
论文摘要
本文提出了Freetree,这是一种基于树的方法,用于具有相关特征的高维纵向数据。当存在相关特征时,通常用于可变选择的随机森林(如随机森林)的性能不佳,并且不会考虑随着时间的推移观察到的数据。 Freetree通过使用分段随机效应模型处理纵向数据。它还通过使用加权相关网络分析(即WGCNA)首先将其聚集来利用特征的网络结构。然后,它在每个特征集群中进行筛选步骤,并在幸存的功能中进行选择步骤,这提供了一种相对无偏见的选择功能的方法。通过将主要的原理组件用作每个叶子的回归变量,以及原始特征作为分裂节点时的分裂变量,Freetree保持其可解释性并提高其计算效率。模拟结果表明,从预测准确性,特征选择准确性以及恢复基础结构的能力方面,Freetree优于其他基于树的方法。
This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there are correlated features and do not account for data observed over time. FREEtree deals with longitudinal data by using a piecewise random effects model. It also exploits the network structure of the features by first clustering them using weighted correlation network analysis, namely WGCNA. It then conducts a screening step within each cluster of features and a selection step among the surviving features, that provides a relatively unbiased way to select features. By using dominant principle components as regression variables at each leaf and the original features as splitting variables at splitting nodes, FREEtree maintains its interpretability and improves its computational efficiency. The simulation results show that FREEtree outperforms other tree-based methods in terms of prediction accuracy, feature selection accuracy, as well as the ability to recover the underlying structure.