RDF数据的跳过向量：基于特征模式的复杂性提取

论文标题

RDF数据的跳过向量：基于特征模式的复杂性提取

Skip Vectors for RDF Data: Extraction Based on the Complexity of Feature Patterns

论文作者

Minami, Yota, Kaneiwa, Ken

论文摘要

资源描述框架（RDF）是描述元数据的框架，例如网络上资源的属性和关系。 RDF图的机器学习任务采用了三种方法：（i）带有RDF图内核，（ii）RDF Graph Embeddings和（iii）关系图形卷积网络的支持向量机（SVM）。在本文中，我们提出了一个新颖的特征向量（称为跳过矢量），该向量通过提取相邻边缘和节点的各种组合来代表RDF图中每个资源的某些特征。为了使跳过矢量低维度，我们根据每个功能的信息增益率选择了分类任务的重要功能。可以通过将每个资源的低维跳过向量应用于常规的机器学习算法，例如SVM，K-Nearealt邻居方法，神经网络，随机森林和Adaboost等传统机器学习算法来执行分类任务。在我们使用RDF数据（例如Wikidata，dbpedia和Yago）的评估实验中，我们将方法与SVM中的RDF图内核进行了比较。我们还将我们的方法与两种方法进行了比较：RDF图嵌入（例如RDF2VEC）和AIFB，MUTAG，BGS和AM基准上的关系图卷积网络。

The Resource Description Framework (RDF) is a framework for describing metadata, such as attributes and relationships of resources on the Web. Machine learning tasks for RDF graphs adopt three methods: (i) support vector machines (SVMs) with RDF graph kernels, (ii) RDF graph embeddings, and (iii) relational graph convolutional networks. In this paper, we propose a novel feature vector (called a Skip vector) that represents some features of each resource in an RDF graph by extracting various combinations of neighboring edges and nodes. In order to make the Skip vector low-dimensional, we select important features for classification tasks based on the information gain ratio of each feature. The classification tasks can be performed by applying the low-dimensional Skip vector of each resource to conventional machine learning algorithms, such as SVMs, the k-nearest neighbors method, neural networks, random forests, and AdaBoost. In our evaluation experiments with RDF data, such as Wikidata, DBpedia, and YAGO, we compare our method with RDF graph kernels in an SVM. We also compare our method with the two approaches: RDF graph embeddings such as RDF2vec and relational graph convolutional networks on the AIFB, MUTAG, BGS, and AM benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题