Grale：设计用于图形学习的网络

论文标题

Grale：设计用于图形学习的网络

Grale: Designing Networks for Graph Learning

论文作者

Halcrow, Jonathan, Moşoi, Alexandru, Ruth, Sam, Perozzi, Bryan

论文摘要

我们如何找到适合半监督学习的合适图？在现实世界应用中，用于计算的边缘的选择是任何图表学习过程中的第一步。有趣的是，通常有许多类型的相似性可以作为节点之间的边缘选择，而边缘的选择会极大地影响下游半监督学习系统的性能。但是，尽管图形设计很重要，但大多数文献都假定该图是静态的。在这项工作中，我们提出了Grale，这是一种可扩展的方法，我们开发了用于解决数十亿节点的图形设计问题。花格的运作是通过将不同（潜在弱）相似性的不同度量融合在一起，以创建图形，该图在其节点之间表现出高任务的同质性。 Grale设计用于在大型数据集上运行。我们已经在Google的20多个不同工业环境中部署了花格，其中包括具有数十十亿节点的数据集和数百万亿个潜在边缘的数据集。通过采用局部性敏感的哈希技术，我们大大减少了需要评分的对数，使我们能够学习特定任务模型，并在数小时内为此类数据集建立关联的最近邻居图，而不是否则可能需要的几天甚至几周。我们通过一个案例研究来说明这一点，在该案例研究中，我们研究了在YouTube上使用数百万个项目在YouTube上的应用中的应用。在此应用程序中，我们发现，在硬编码的规则和内容分类器之上，Grale发现了大量恶意演员，仅这些方法将总回忆增加了89％。

How can we find the right graph for semi-supervised learning? In real world applications, the choice of which edges to use for computation is the first step in any graph learning process. Interestingly, there are often many types of similarity available to choose as the edges between nodes, and the choice of edges can drastically affect the performance of downstream semi-supervised learning systems. However, despite the importance of graph design, most of the literature assumes that the graph is static. In this work, we present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes. Grale operates by fusing together different measures of(potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes. Grale is designed for running on large datasets. We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score. By employing locality sensitive hashing techniques,we greatly reduce the number of pairs that need to be scored, allowing us to learn a task specific model and build the associated nearest neighbor graph for such datasets in hours, rather than the days or even weeks that might be required otherwise. We illustrate this through a case study where we examine the application of Grale to an abuse classification problem on YouTube with hundreds of million of items. In this application, we find that Grale detects a large number of malicious actors on top of hard-coded rules and content classifiers, increasing the total recall by 89% over those approaches alone.

下载PDF全文

下载文献需遵守相关版权规定

论文标题