论文标题
基于相似性的链接预测网络流的模块化压缩
Similarity-based Link Prediction from Modular Compression of Network Flows
论文作者
论文摘要
节点相似性分数是用于聚类,节点分类,异常检测以及与生物系统,信息网络和推荐系统中应用的链接预测的图形中的机器学习的基础。有关链接预测的最新工作使用矢量空间嵌入来计算具有良好性能的无向网络中的节点相似性。尽管如此,它们仍有几个缺点:有限的解释性,对高参数调整的需求,通过降低维度拟合的手动模型拟合以及导向链路预测中对称相似性的性能差。我们提出了Mapsim,这是一种基于网络流的模块化压缩评估节点相似性的信息理论措施。与矢量空间嵌入不同,Mapsim在社区的离散,非现场空间中代表节点,并以无监督的方式产生不对称的相似性。我们将链接预测任务上的Mapsim与47个网络中流行的基于嵌入的算法进行比较,发现Mapsim在所有网络中的平均性能比其最接近的竞争对手高7%以上,在47个47个网络中的所有嵌入方法都优于47个网络。我们的方法展示了图表学习中基于压缩的方法的潜力,并在其他图表学习任务中具有有希望的应用。
Node similarity scores are a foundation for machine learning in graphs for clustering, node classification, anomaly detection, and link prediction with applications in biological systems, information networks, and recommender systems. Recent works on link prediction use vector space embeddings to calculate node similarities in undirected networks with good performance. Still, they have several disadvantages: limited interpretability, need for hyperparameter tuning, manual model fitting through dimensionality reduction, and poor performance from symmetric similarities in directed link prediction. We propose MapSim, an information-theoretic measure to assess node similarities based on modular compression of network flows. Unlike vector space embeddings, MapSim represents nodes in a discrete, non-metric space of communities and yields asymmetric similarities in an unsupervised fashion. We compare MapSim on a link prediction task to popular embedding-based algorithms across 47 networks and find that MapSim's average performance across all networks is more than 7% higher than its closest competitor, outperforming all embedding methods in 11 of the 47 networks. Our method demonstrates the potential of compression-based approaches in graph representation learning, with promising applications in other graph learning tasks.