带有pq-gram的有序标记树的公制学习

论文标题

带有pq-gram的有序标记树的公制学习

Metric Learning for Ordered Labeled Trees with pq-grams

论文作者

Shindo, Hikaru, Nishino, Masaaki, Kobayashi, Yasuaki, Yamamoto, Akihiro

论文摘要

计算两个数据点之间的相似性在许多机器学习算法中起着至关重要的作用。公制学习的目的是从数据自动学习良好的度量。关于树结构数据的大多数现有关于公制学习的研究都采用了学习树编辑距离的方法。但是，编辑距离不适合大数据分析，因为它会产生高计算成本。在本文中，我们为使用PQ-Grams的树结构数据提出了一种新的度量学习方法。 PQ-Gram距离是有序标记树的距离，并且计算成本比树编辑距离低得多。为了基于PQ-Gram进行度量学习，我们提出了一个新的可区分的参数化距离，加权PQ-GRAM距离。我们还提出了一种基于较大边缘最近的邻居（LMNN）学习建议的距离的方法，这是一种经过良好且实用的度量学习方案。我们将公制学习问题作为优化问题，并使用梯度下降技术进行度量学习。我们从经验上表明，所提出的方法不仅通过在各种分类问题中使用基于距离的最新距离方法来实现竞争结果，而且还比基于距离的方法更快地解决了分类问题。

Computing the similarity between two data points plays a vital role in many machine learning algorithms. Metric learning has the aim of learning a good metric automatically from data. Most existing studies on metric learning for tree-structured data have adopted the approach of learning the tree edit distance. However, the edit distance is not amenable for big data analysis because it incurs high computation cost. In this paper, we propose a new metric learning approach for tree-structured data with pq-grams. The pq-gram distance is a distance for ordered labeled trees, and has much lower computation cost than the tree edit distance. In order to perform metric learning based on pq-grams, we propose a new differentiable parameterized distance, weighted pq-gram distance. We also propose a way to learn the proposed distance based on Large Margin Nearest Neighbors (LMNN), which is a well-studied and practical metric learning scheme. We formulate the metric learning problem as an optimization problem and use the gradient descent technique to perform metric learning. We empirically show that the proposed approach not only achieves competitive results with the state-of-the-art edit distance-based methods in various classification problems, but also solves the classification problems much more rapidly than the edit distance-based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题