图形表示学习中最大化的内聚和分离：一种远距离感知的负抽样方法

论文标题

图形表示学习中最大化的内聚和分离：一种远距离感知的负抽样方法

Maximizing Cohesion and Separation in Graph Representation Learning: A Distance-aware Negative Sampling Approach

论文作者

Maruf, M., Karpatne, Anuj

论文摘要

无监督的图表学习（GRL）的目的是学习一个反映给定未标记图的结构的节点嵌入的低维空间。此任务的现有算法依赖于负面采样目标，这些目标通过维持节点对的正和负语料库来最大程度地提高附近节点的节点嵌入（称为“凝聚力”）的相似性。虽然阳性样品是从短随机步行中共发生的节点对绘制的，但常规方法通过统一采样随机对构建负语料库，从而忽略了有关遥远节点对之间结构差异的有价值信息（称为“分离”）。在本文中，我们提出了一种新颖的距离感知的负抽样（DNS），该采样最大化远处的节点对的分离，同时通过将负采样概率设置为成比例的最短距离，同时在附近的节点对处最大化内聚会。我们的方法可以与任何GRL算法结合使用，我们证明了方法对基线负面采样方法的疗效，而不是许多基准节点分类任务上的许多基准数据集和GRL算法。我们所有的代码和数据集可在https://github.com/distance-awarens/dns/上找到。

The objective of unsupervised graph representation learning (GRL) is to learn a low-dimensional space of node embeddings that reflect the structure of a given unlabeled graph. Existing algorithms for this task rely on negative sampling objectives that maximize the similarity in node embeddings at nearby nodes (referred to as "cohesion") by maintaining positive and negative corpus of node pairs. While positive samples are drawn from node pairs that co-occur in short random walks, conventional approaches construct negative corpus by uniformly sampling random pairs, thus ignoring valuable information about structural dissimilarity among distant node pairs (referred to as "separation"). In this paper, we present a novel Distance-aware Negative Sampling (DNS) which maximizes the separation of distant node-pairs while maximizing cohesion at nearby node-pairs by setting the negative sampling probability proportional to the pair-wise shortest distances. Our approach can be used in conjunction with any GRL algorithm and we demonstrate the efficacy of our approach over baseline negative sampling methods over downstream node classification tasks on a number of benchmark datasets and GRL algorithms. All our codes and datasets are available at https://github.com/Distance-awareNS/DNS/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题