论文标题
使用在通常的聚类算法中携带气候数据知识的专家偏差
Using an expert deviation carrying the knowledge of climate data in usual clustering algorithms
论文作者
论文摘要
为了帮助物理学家在较小的安特列斯群岛中扩大对气候的知识,我们旨在使用风速和累积降雨数据集的聚类分析来确定时空配置。但是我们表明,在常规聚类方法中使用L2标准作为K-均值(KMS)和分层集聚聚类(HAC)可能会引起不良影响。因此,我们建议将欧几里得距离(L2)替换为称为专家偏差(ED)的差异度量。基于对称的kullback-leibler差异,ED集成了观察到的物理参数和气候知识的属性。该措施有助于比较与大气结构影响的四个斑块的直方图,这些斑块的直方图对应于地理区域。进行了内部同质性的联合评估以及使用ED和L2获得的群集的分离。使用轮廓指数进行比较的结果显示了五个具有较高索引的簇。对于两个可用的数据集,可以看到,与KMS-L2不同,KMS-ED可以很好地区分日常情况,从而为算法发现的群集提供了更大的物理含义。在KMS-ED的代表性元素的空间分析中观察到了斑块的效果。 ED能够产生不同的配置,从而使通常的大气结构清楚地识别。大气物理学家可以根据大气结构来解释每个簇对特定区域的影响的位置。 KMS-L2不会导致这种解释性,因为所代表的情况在空间上非常平稳。这项气候研究说明了将ED用作新方法的优势。
In order to help physicists to expand their knowledge of the climate in the Lesser Antilles, we aim to identify the spatio-temporal configurations using clustering analysis on wind speed and cumulative rainfall datasets. But we show that using the L2 norm in conventional clustering methods as K-Means (KMS) and Hierarchical Agglomerative Clustering (HAC) can induce undesirable effects. So, we propose to replace Euclidean distance (L2) by a dissimilarity measure named Expert Deviation (ED). Based on the symmetrized Kullback-Leibler divergence, the ED integrates the properties of the observed physical parameters and climate knowledge. This measure helps comparing histograms of four patches, corresponding to geographical zones, that are influenced by atmospheric structures. The combined evaluation of the internal homogeneity and the separation of the clusters obtained using ED and L2 was performed. The results, which are compared using the silhouette index, show five clusters with high indexes. For the two available datasets one can see that, unlike KMS-L2, KMS-ED discriminates the daily situations favorably, giving more physical meaning to the clusters discovered by the algorithm. The effect of patches is observed in the spatial analysis of representative elements for KMS-ED. The ED is able to produce different configurations which makes the usual atmospheric structures clearly identifiable. Atmospheric physicists can interpret the locations of the impact of each cluster on a specific zone according to atmospheric structures. KMS-L2 does not lead to such an interpretability, because the situations represented are spatially quite smooth. This climatological study illustrates the advantage of using ED as a new approach.