具有配置模型和最小描述长度的网络压缩

论文标题

具有配置模型和最小描述长度的网络压缩

Network compression with configuration models and the minimum description length

论文作者

Hébert-Dufresne, Laurent, Young, Jean-Gabriel, Daniels, Alexander, Kirkley, Alec, Allard, Antoine

论文摘要

被限制为重现特定统计特征的随机网络模型通常用于表示和分析网络数据及其数学描述。其中的主要内容，配置模型将随机网络通过其度分布限制，并且是网络科学许多领域的基础。但是，通常根据直觉或数学和计算简单性而不是统计证据选择配置模型及其变体。为了评估网络表示的质量，我们需要考虑指定随机网络模型所需的信息量，也需要考虑使用模型作为生成过程时恢复原始数据的可能性。为此，我们计算流行配置模型及其概括所产生的网络集成的大约大小，包括占学位相关性和中心层的版本。然后，我们将最小描述长度原理作为模型选择标准应用于所得的配置模型家族。使用来自各个域的100多个网络的数据集，我们发现经典配置模型通常在平均度高于十个的网络上是优选的，而由中心度度量的层次配置模型则可以提供大多数稀疏网络的最紧凑表示形式。

Random network models, constrained to reproduce specific statistical features, are often used to represent and analyze network data and their mathematical descriptions. Chief among them, the configuration model constrains random networks by their degree distribution and is foundational to many areas of network science. However, configuration models and their variants are often selected based on intuition or mathematical and computational simplicity rather than on statistical evidence. To evaluate the quality of a network representation, we need to consider both the amount of information required to specify a random network model and the probability of recovering the original data when using the model as a generative process. To this end, we calculate the approximate size of network ensembles generated by the popular configuration model and its generalizations, including versions accounting for degree correlations and centrality layers. We then apply the minimum description length principle as a model selection criterion over the resulting nested family of configuration models. Using a dataset of over 100 networks from various domains, we find that the classic Configuration Model is generally preferred on networks with an average degree above ten, while a Layered Configuration Model constrained by a centrality metric offers the most compact representation of the majority of sparse networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题