论文标题
基于性能数据和聚集有效性指数的足球运动员聚类
Clustering of football players based on performance data and aggregated clustering validity indexes
论文作者
论文摘要
我们分析了八个欧洲大联盟2014-15赛季的混合型变量的足球运动员绩效数据。我们基于量身定制的差异度量来聚集这些数据。 为了在许多可用的聚类方法之间做出决定并选择适当数量的簇,我们使用Akhanli和Hennig(2020)的方法。这是基于几个验证标准,指的是聚类的不同理想特征。这些特征是基于聚类的目的选择的,这允许将合适的验证索引定义为测量所需特征的校准单个索引的加权平均值。 我们得出两个不同的聚类。第一个是将数据集的分区分为基本不同的参与者,可用于分析团队的组成。第二个将数据集分为许多小群集(平均有10个玩家),可用于查找与给定玩家非常相似的玩家。深入讨论这些聚类的特征是什么特征。加权第二个集群的标准是对足球专家的调查得出的。
We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. We cluster these data based on a tailor-made dissimilarity measure. In order to decide between the many available clustering methods and to choose an appropriate number of clusters, we use the approach by Akhanli and Hennig (2020). This is based on several validation criteria that refer to different desirable characteristics of a clustering. These characteristics are chosen based on the aim of clustering, and this allows to define a suitable validation index as weighted average of calibrated individual indexes measuring the desirable features. We derive two different clusterings. The first one is a partition of the data set into major groups of essentially different players, which can be used for the analysis of a team's composition. The second one divides the data set into many small clusters (with 10 players on average), which can be used for finding players with a very similar profile to a given player. It is discussed in depth what characteristics are desirable for these clusterings. Weighting the criteria for the second clustering is informed by a survey of football experts.