论文标题
在使用顶点协变量的随机块模型图中用于社区检测的光谱算法
On spectral algorithms for community detection in stochastic blockmodel graphs with vertex covariates
论文作者
论文摘要
在网络推理应用程序中,通常需要根据某种相似性来检测社区结构,即将顶点分为组或块。除了仅仅邻接矩阵之外,许多真实网络还涉及顶点协变量,这些协变量具有有关图形中基础结构的关键信息。为了评估此类协变量对块恢复的影响,我们对具有顶点协变量的随机块模型图中的两种基于模型的光谱算法进行了比较分析。第一个算法仅使用邻接矩阵,并直接估计块分配。第二算法将邻接矩阵和顶点协变量同时纳入块分配的估计,并且量化了顶点协变量对块分配的估计值的明确影响。我们采用Chernoff信息来分析算法的性能,并为某些感兴趣的模型得出信息理论的Chernoff比率。分析结果和模拟表明,第二种算法通常是优选的:我们通常可以通过首先估计顶点协变量的效果来更好地估计诱导的块分配。此外,实际数据示例还表明,第二算法具有揭示基本块结构并在实际应用中考虑到的顶点异质性的优点。我们的发现强调了区分可能影响图中块结构的观察到的因素和未观察到的因素的重要性。
In network inference applications, it is often desirable to detect community structure, namely to cluster vertices into groups, or blocks, according to some measure of similarity. Beyond mere adjacency matrices, many real networks also involve vertex covariates that carry key information about underlying block structure in graphs. To assess the effects of such covariates on block recovery, we present a comparative analysis of two model-based spectral algorithms for clustering vertices in stochastic blockmodel graphs with vertex covariates. The first algorithm uses only the adjacency matrix, and directly estimates the block assignments. The second algorithm incorporates both the adjacency matrix and the vertex covariates into the estimation of block assignments, and moreover quantifies the explicit impact of the vertex covariates on the resulting estimate of the block assignments. We employ Chernoff information to analytically compare the algorithms' performance and derive the information-theoretic Chernoff ratio for certain models of interest. Analytic results and simulations suggest that the second algorithm is often preferred: we can often better estimate the induced block assignments by first estimating the effect of vertex covariates. In addition, real data examples also indicate that the second algorithm has the advantages of revealing underlying block structure and taking observed vertex heterogeneity into account in real applications. Our findings emphasize the importance of distinguishing between observed and unobserved factors that can affect block structure in graphs.