论文标题

贪婪:用于基于模型聚类的R包,通过贪婪的最大化综合分类可能性

greed: An R Package for Model-Based Clustering by Greedy Maximization of the Integrated Classification Likelihood

论文作者

Côme, Etienne, Jouvin, Nicolas

论文摘要

贪婪的软件包实现了Arxiv:2002.11577的一般且灵活的框架,用于基于R语言的基于模型的聚类。基于相对于分区的精确集成分类可能性的直接最大化,它允许共同执行集群和组数量的选择。通过有效的混合遗传算法来处理此组合问题,而最终的层次步骤允许访问更粗的分区并提取簇的排序。该方法适用于多种潜在变量模型,因此可以处理各种数据类型以及异质数据。实现了连续,计数,分类和图形数据的经典模型,并且可以通过S4类抽象来合并新模型。本文介绍了包装,即指导其开发的设计选择,并说明了其在实际用例上的使用。

The greed package implements the general and flexible framework of arXiv:2002.11577 for model-based clustering in the R language. Based on the direct maximization of the exact Integrated Classification Likelihood with respect to the partition, it allows jointly performing clustering and selection of the number of groups. This combinatorial problem is handled through an efficient hybrid genetic algorithm, while a final hierarchical step allows accessing coarser partitions and extract an ordering of the clusters. This methodology is applicable in a wide variety of latent variable models and, hence, can handle various data types as well as heterogeneous data. Classical models for continuous, count, categorical and graph data are implemented, and new models may be incorporated thanks to S4 class abstraction. This paper introduces the package, the design choices that guided its development and illustrates its usage on practical use-cases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源