论文标题

通过边缘提名观察的网络的社区模型

Community models for networks observed through edge nominations

论文作者

Li, Tianxi, Levina, Elizaveta, Zhu, Ji

论文摘要

社区是网络中一种常见且经过广泛研究的结构,通常是在网络完全正确观察到的假设下。实际上,网络数据通常是通过查询节点的连接来收集的。在某些设置中,将记录一个采样节点的所有边缘,在其他情况下,可以要求节点命名其连接。这些抽样机制引入了噪声和偏见,这些噪声和偏见可能掩盖了社区结构并使标准社区检测方法的基础假设无效。我们为基于记录边缘的一类网络采样机制提出了一个通用模型,该模型通过查询节点进行了记录,旨在改善以这种方式收集的网络数据的社区检测。我们将边缘采样概率与单个偏好和社区参数的函数建模,并且可以通过在此通用类别下通过光谱聚类来显示社区检测。我们还建议,作为通用框架的特殊情况,是针对有向网络的参数模型,我们称为提名随机块模型,该模型允许进行有意义的参数解释,并且可以通过矩方法来拟合。在这种情况下,光谱聚类和矩的方法在计算上都是有效的,并且具有一致性的理论保证。我们评估了对未加权和加权网络的模拟研究中提出的模型,并将其应用于雇用数据集的教师,发现了美国商学院中社区的有意义的层次结构。

Communities are a common and widely studied structure in networks, typically under the assumption that the network is fully and correctly observed. In practice, network data are often collected by querying nodes about their connections. In some settings, all edges of a sampled node will be recorded, and in others, a node may be asked to name its connections. These sampling mechanisms introduce noise and bias which can obscure the community structure and invalidate assumptions underlying standard community detection methods. We propose a general model for a class of network sampling mechanisms based on recording edges via querying nodes, designed to improve community detection for network data collected in this fashion. We model edge sampling probabilities as a function of both individual preferences and community parameters, and show community detection can be performed by spectral clustering under this general class of models. We also propose, as a special case of the general framework, a parametric model for directed networks we call the nomination stochastic block model, which allows for meaningful parameter interpretations and can be fitted by the method of moments. Both spectral clustering and the method of moments in this case are computationally efficient and come with theoretical guarantees of consistency. We evaluate the proposed model in simulation studies on both unweighted and weighted networks and apply it to a faculty hiring dataset, discovering a meaningful hierarchy of communities among US business schools.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源