论文标题
随机噪声与最先进的概率预测方法:关于CRPS-和-AM歧视能力的案例研究
Random Noise vs State-of-the-Art Probabilistic Forecasting Methods : A Case Study on CRPS-Sum Discrimination Ability
论文作者
论文摘要
机器学习域中的最新发展使复杂的多元概率预测模型的发展。因此,拥有一种精确的评估方法是至关重要的,以评估这些复杂方法的性能和可预测能力。为此,过去曾提出过几个评估指标(例如能量评分,Dawid-Sebastiani得分,变量图评分),但是,它们无法可靠地衡量概率预报掌握的性能。最近,作为多元概率预测的可靠度量,CRPS-SUM已获得了许多突出性。本文介绍了对CRPS-SUM的系统评估,以了解其歧视能力。我们表明,目标数据的统计特性会影响CRPS-SUM的歧视能力。此外,我们强调,CRPS-SUM计算可忽略每个维度上模型的性能。这些缺陷会导致我们对模型性能进行错误的评估。最后,通过对现实世界数据集的实验,我们证明了CRPS-SUM的缺点提供了概率预测性能方法的误导性指示。我们表明,与最先进的方法相比,可以很容易地为虚拟模型提供更好的CRPS-sum,该模型看起来像随机噪声。
The recent developments in the machine learning domain have enabled the development of complex multivariate probabilistic forecasting models. Therefore, it is pivotal to have a precise evaluation method to gauge the performance and predictability power of these complex methods. To do so, several evaluation metrics have been proposed in the past (such as Energy Score, Dawid-Sebastiani score, variogram score), however, they cannot reliably measure the performance of a probabilistic forecaster. Recently, CRPS-sum has gained a lot of prominence as a reliable metric for multivariate probabilistic forecasting. This paper presents a systematic evaluation of CRPS-sum to understand its discrimination ability. We show that the statistical properties of target data affect the discrimination ability of CRPS-Sum. Furthermore, we highlight that CRPS-Sum calculation overlooks the performance of the model on each dimension. These flaws can lead us to an incorrect assessment of model performance. Finally, with experiments on the real-world dataset, we demonstrate that the shortcomings of CRPS-Sum provide a misleading indication of the probabilistic forecasting performance method. We show that it is easily possible to have a better CRPS-Sum for a dummy model, which looks like random noise, in comparison to the state-of-the-art method.