论文标题
Primgraph:具有概率设置表示的高性能和高准确图挖掘
ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations
论文作者
论文摘要
重要的图挖掘问题(例如聚类)是计算要求的。为了显着加速这些问题,我们提出了批处理:一个图表表示,可以在工作,深度和结果准确性上具有强大的理论保证,可以简单而快速的平行图挖掘。关键想法是使用概率集表示(例如Bloom滤波器)表示顶点集。这些表示的过程比原始顶点集要快得多。我们将这些表示形式用作重要的平行图挖掘算法中的构建块,例如集合计数或聚类。当使用Primgraph增强时,这些算法的表现显着优于平行的精确基线(在32个内核上近50倍),同时确保许多输入图数据集的准确性超过90%。我们基于具有理想统计属性的概率集表示的新型界限和算法对数据分析社区具有独立的兴趣。
Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50x on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community.