通过比较模拟研究评估生物医学研究中拉索选择性推断的方法

论文标题

通过比较模拟研究评估生物医学研究中拉索选择性推断的方法

Evaluating methods for Lasso selective inference in biomedical research by a comparative simulation study

论文作者

Kammer, Michael, Dunkler, Daniela, Michiels, Stefan, Heinze, Georg

论文摘要

回归模型的可变选择在生物医学数据分析中起关键作用。但是，选择后的推论并未被经典的统计频率理论所涵盖，该理论假设模型中假设一组固定的协变量。我们回顾了选择后推理的两个解释：完整的模型视图，其中感兴趣的参数是所有预测变量的完整模型的参数，然后专注于子模型视图，其中感兴趣的参数仅是所选模型的参数。在L1占回归的背景下，我们通过使用软件包进行的置信区间比较了子模型推理（选择性推论），该置信区间使用软件包进行了使用的仿真研究，该仿真研究灵感来自生物医学研究中常见的真实数据。此外，我们向公开可用的数据集提供了这些方法的示例应用，以讨论其实际可用性。我们的发现表明，选择性置信区间的频繁属性通常是可以接受的，但是在最保守的方法外，并非所有情况下都保证了所需的覆盖水平。推理方法的选择可能会对最终的间隔估计产生很大影响，从而使用户敏锐地意识到推理的目标，以解释和传达结果。当前可用的软件包尚未非常用户友好或强大，这可能会影响其在实践中的使用。总而言之，我们发现选择后的子模型推断对经验丰富的统计学家评估单个选定预测因子在未来应用中的重要性有用。

Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory which assumes a fixed set of covariates in the model. We review two interpretations of inference after selection: the full model view, in which the parameters of interest are those of the full model on all predictors, and then focus on the submodel view, in which the parameters of interest are those of the selected model only. In the context of L1-penalized regression we compare proposals for submodel inference (selective inference) via confidence intervals available to applied researchers via software packages using a simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. Our findings indicate that the frequentist properties of selective confidence intervals are generally acceptable, but desired coverage levels are not guaranteed in all scenarios except for the most conservative methods. The choice of inference method potentially has a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. Currently available software packages are not yet very user friendly or robust which might affect their use in practice. In summary, we find submodel inference after selection useful for experienced statisticians to assess the importance of individual selected predictors in future applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题