论文标题

在测量异质分裂线性回归中模型的复杂性

On Measuring Model Complexity in Heteroscedastic Linear Regression

论文作者

Luan, Bo, Lee, Yoonkyung, Zhu, Yunzhang

论文摘要

异质性在现实世界应用中很常见,通常通过将案例权重纳入建模程序来处理。直觉上,拟合具有不同权重方案的模型将具有不同级别的复杂性,具体取决于权重匹配误差方差的倒数。但是,在模型复杂性上的现有统计理论(也称为模型的自由度)主要是在相同误差方差的假设下建立的。在这项工作中,我们专注于线性回归程序,并试图将现有措施扩展到异性范围。我们对加权最小二乘法的分析揭示了扩展测量的一些有趣特性。特别是,我们发现它们既取决于用于模型拟合的权重,又取决于用于模型评估的权重。此外,对具有最佳权重的异质数据进行建模通常会导致自由度少于相同的权重,而减小的大小则取决于误差差异的不均匀度。这为加权建模程序提供了更多的见解,这些过程可用于风险估计和模型选择。

Heteroscedasticity is common in real world applications and is often handled by incorporating case weights into a modeling procedure. Intuitively, models fitted with different weight schemes would have a different level of complexity depending on how well the weights match the inverse of error variances. However, existing statistical theories on model complexity, also known as model degrees of freedom, were primarily established under the assumption of equal error variances. In this work, we focus on linear regression procedures and seek to extend the existing measures to a heteroscedastic setting. Our analysis of the weighted least squares method reveals some interesting properties of the extended measures. In particular, we find that they depend on both the weights used for model fitting and those for model evaluation. Moreover, modeling heteroscedastic data with optimal weights generally results in fewer degrees of freedom than with equal weights, and the size of reduction depends on the unevenness of error variance. This provides additional insights into weighted modeling procedures that are useful in risk estimation and model selection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源