论文标题

旨在解释流式惩罚回归模型中时变正规化参数

Towards the interpretation of time-varying regularization parameters in streaming penalized regression models

论文作者

Zboňáková, Lenka, Monti, Ricardo Pio, Härdle, Wolfgang Karl

论文摘要

高维流数据集在现代应用中无处不在。示例范围从金融和电子商务到生物医学和神经影像数据的研究。结果,已经提出了许多新型算法来解决此类数据集提出的挑战。在这项工作中,我们专注于在最近(可能是非平稳)流数据的背景下使用$ \ ell_1 $正规化线性模型,已经注意到,正则化参数的选择在此类模型中是基本的,并且已经提出了几种方法,这些方法以〜时间效率的方式进行了迭代的调整。从而允许估计模型的潜在稀疏性变化。此外,在许多应用程序中,对正则化参数的推断本身可能是感兴趣的,因为这样的参数与模型的基本\ textit {sparsity}有关。但是,在这项工作中,我们强调并提供了有关数据中各种(通常无关的)统计特性如何导致正则化参数变化的广泛经验证据。特别是,通过各种综合实验,我们证明了正则化参数的变化可能是由真实潜在的稀疏性,信噪比甚至模型错误指定的变化驱动的。因此,该字母的目的是突出和分类各种统计属性,这些统计特性会导致相关的正则化参数变化。我们通过介绍两个应用程序来结束:一种与财务数据有关,另一个与神经成像数据有关,在上述讨论是相关的,其中。

High-dimensional, streaming datasets are ubiquitous in modern applications. Examples range from finance and e-commerce to the study of biomedical and neuroimaging data. As a result, many novel algorithms have been proposed to address challenges posed by such datasets. In this work, we focus on the use of $\ell_1$ regularized linear models in the context of (possibly non-stationary) streaming data Recently, it has been noted that the choice of the regularization parameter is fundamental in such models and several methods have been proposed which iteratively tune such a parameter in a~time-varying manner; thereby allowing the underlying sparsity of estimated models to vary. Moreover, in many applications, inference on the regularization parameter may itself be of interest, as such a parameter is related to the underlying \textit{sparsity} of the model. However, in this work, we highlight and provide extensive empirical evidence regarding how various (often unrelated) statistical properties in the data can lead to changes in the regularization parameter. In particular, through various synthetic experiments, we demonstrate that changes in the regularization parameter may be driven by changes in the true underlying sparsity, signal-to-noise ratio or even model misspecification. The purpose of this letter is, therefore, to highlight and catalog various statistical properties which induce changes in the associated regularization parameter. We conclude by presenting two applications: one relating to financial data and another to neuroimaging data, where the aforementioned discussion is relevant.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源