在大数据库中识别重要的预测指标 - 多个测试和模型选择

论文标题

在大数据库中识别重要的预测指标 - 多个测试和模型选择

Identifying important predictors in large data bases -- multiple testing and model selection

论文作者

Bogdan, Malgorzata, Frommlet, Florian

论文摘要

这是即将出版的多重测试手册的一章。我们在高维环境中考虑了多种模型选择策略，其中潜在的预测因子P的数量与可用观测值n相比大。在p> n的情况下，特别是适合的信息标准的修改，并将其与各种受惩罚的可能性方法（尤其是斜率和斜坡）进行了比较。重点是通过模型识别来控制FDR的方法。在模型识别和预测方面提供了理论结果，并提出了各种模拟结果，以说明不同情况下不同方法的性能。

This is a chapter of the forthcoming Handbook of Multiple Testing. We consider a variety of model selection strategies in a high-dimensional setting, where the number of potential predictors p is large compared to the number of available observations n. In particular modifications of information criteria which are suitable in case of p > n are introduced and compared with a variety of penalized likelihood methods, in particular SLOPE and SLOBE. The focus is on methods which control the FDR in terms of model identification. Theoretical results are provided both with respect to model identification and prediction and various simulation results are presented which illustrate the performance of the different methods in different situations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题