在连续测试程序中选择一个显着性水平以进行社区检测

论文标题

在连续测试程序中选择一个显着性水平以进行社区检测

Selecting a significance level in sequential testing procedures for community detection

论文作者

Ghosh, Riddhi Pratim, Barnett, Ian

论文摘要

尽管已经开发了许多顺序算法来估计网络中的社区结构，但几乎没有可用的指导和研究在这些顺序测试程序中使用的重要性水平或停止参数。大多数算法都依赖于对社区的数量进行预先定义或使用任意停止规则。我们提供了一种原则性的方法，可以通过控制公差比来选择顺序社区检测程序的名义显着性水平，该公差比定义为拟合网络中群集数量的拟合不足和过度拟合概率的比率。我们引入了一种算法，用于从用户指定的公差比指定这种显着性水平，并在随机块模型框架中使用顺序模块化最大化方法演示其实用性。我们通过广泛的模拟评估了所提出的算法的性能，并证明了其在控制单细胞RNA测序聚类中通过细胞类型和通过聚类国会投票网络来控制公差比。

While there have been numerous sequential algorithms developed to estimate community structure in networks, there is little available guidance and study of what significance level or stopping parameter to use in these sequential testing procedures. Most algorithms rely on prespecifiying the number of communities or use an arbitrary stopping rule. We provide a principled approach to selecting a nominal significance level for sequential community detection procedures by controlling the tolerance ratio, defined as the ratio of underfitting and overfitting probability of estimating the number of clusters in fitting a network. We introduce an algorithm for specifying this significance level from a user-specified tolerance ratio, and demonstrate its utility with a sequential modularity maximization approach in a stochastic block model framework. We evaluate the performance of the proposed algorithm through extensive simulations and demonstrate its utility in controlling the tolerance ratio in single-cell RNA sequencing clustering by cell type and by clustering a congressional voting network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题