自我监督学习上下文嵌入的链接预测中的嵌入

论文标题

自我监督学习上下文嵌入的链接预测中的嵌入

Self-Supervised Learning of Contextual Embeddings for Link Prediction in Heterogeneous Networks

论文作者

Wang, Ping, Agarwal, Khushbu, Ham, Colby, Choudhury, Sutanay, Reddy, Chandan K.

论文摘要

异质网络的表示学习方法为每个节点产生低维矢量嵌入，通常针对涉及该节点的所有任务固定。许多现有方法着重于以对使用情况的下游应用程序不可知的方式获得节点的静态向量表示。但是，在实践中，诸如链接预测之类的下游任务需要特定的上下文信息，这些信息可以从与所提供的节点相关的子图中提取。为了应对这一挑战，我们开发了切片，这是一个框架桥接静态表示方法，使用来自整个图的全局信息以及局部注意力驱动的机制来学习上下文节点表示。我们首先通过引入高阶语义关联和掩盖节点，以一种自制的方式预先培训我们的模型，然后对我们的模型进行特定的链接预测任务进行微调。我们没有通过汇总通过Metapath连接的所有语义邻居的信息来训练节点表示，而是自动学习不同元数据的组成，这些元素的组成代表了特定任务的上下文，而无需任何预定的Metapaths。切片在几个公开可用的基准网络数据集上大大优于静态和上下文嵌入学习方法。我们还解释了语义关联矩阵，并在网络中的异质节点之间进行成功的链接预测提供了效用和相关性。

Representation learning methods for heterogeneous networks produce a low-dimensional vector embedding for each node that is typically fixed for all tasks involving the node. Many of the existing methods focus on obtaining a static vector representation for a node in a way that is agnostic to the downstream application where it is being used. In practice, however, downstream tasks such as link prediction require specific contextual information that can be extracted from the subgraphs related to the nodes provided as input to the task. To tackle this challenge, we develop SLiCE, a framework bridging static representation learning methods using global information from the entire graph with localized attention driven mechanisms to learn contextual node representations. We first pre-train our model in a self-supervised manner by introducing higher-order semantic associations and masking nodes, and then fine-tune our model for a specific link prediction task. Instead of training node representations by aggregating information from all semantic neighbors connected via metapaths, we automatically learn the composition of different metapaths that characterize the context for a specific task without the need for any pre-defined metapaths. SLiCE significantly outperforms both static and contextual embedding learning methods on several publicly available benchmark network datasets. We also interpret the semantic association matrix and provide its utility and relevance in making successful link predictions between heterogeneous nodes in the network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题