论文标题
ECPC:用于高维预测的通用二氧化碳模型的R包装
ecpc: An R-package for generic co-data models for high-dimensional prediction
论文作者
论文摘要
高维预测考虑的数据比样本更多。通用研究目标是找到最佳的预测因子或选择变量。可以通过以Co-Data的形式利用先验信息来改善结果,从而在样本上而是在变量上提供互补数据。我们考虑自适应山脊惩罚广义线性和COX模型,其中可变的特定脊惩罚被调整为Co-Data,以使更重要的重量更重要。 R-A-A-Ackage ECPC最初容纳了各种和可能的多个Co-DATA来源,包括分类Co-Data,即变量组和连续的Co-Data。但是,连续的共同数据是通过自适应离散化来处理的,可能会导致建模和丢失信息。在这里,我们提出了通用Co-DATA模型的方法和软件的扩展,尤其是对于连续的Co-Data。在基础上是经典的线性回归模型,在Co-DATA上回归了先前的方差权重。然后,通过经验贝叶斯力矩估计来估计二ATA变量。在将估计过程放置在经典回归框架中后,将延伸到广义添加剂和形状约束的co-DATA模型很简单。此外,我们还展示了如何通过R-Package Squeezy将山脊惩罚转化为弹性净罚球。在仿真研究中,我们首先比较了从扩展的连续Co-DATA与原始方法进行比较。其次,我们将变量选择性性能与其他变量选择方法进行比较。此外,我们在整个论文的几个示例中都证明了包装的使用。
High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior information in the form of co-data, providing complementary data not on the samples, but on the variables. We consider adaptive ridge penalised generalised linear and Cox models, in which the variable specific ridge penalties are adapted to the co-data to give a priori more weight to more important variables. The R-package ecpc originally accommodated various and possibly multiple co-data sources, including categorical co-data, i.e. groups of variables, and continuous co-data. Continuous co-data, however, was handled by adaptive discretisation, potentially inefficiently modelling and losing information. Here, we present an extension to the method and software for generic co-data models, particularly for continuous co-data. At the basis lies a classical linear regression model, regressing prior variance weights on the co-data. Co-data variables are then estimated with empirical Bayes moment estimation. After placing the estimation procedure in the classical regression framework, extension to generalised additive and shape constrained co-data models is straightforward. Besides, we show how ridge penalties may be transformed to elastic net penalties with the R-package squeezy. In simulation studies we first compare various co-data models for continuous co-data from the extension to the original method. Secondly, we compare variable selection performance to other variable selection methods. Moreover, we demonstrate use of the package in several examples throughout the paper.