从几个随机种子中近似sublenear时间的dasgupta成本

论文标题

从几个随机种子中近似sublenear时间的dasgupta成本

Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds

论文作者

Kapralov, Michael, Kumar, Akash, Lattanzi, Silvio, Mousavifar, Aida, Wrzos-Kaminska, Weronika

论文摘要

自Goldreich和Ron [stoc'96]在扩展测试中的基础工作以来，测试图群集结构一直是财产测试研究的中心对象，即区分单个群集（一个扩展器）和图形远非单个群集的问题。更一般而言，$（k，ε）$ - 可簇图$ g $是一个图形，其顶点套件可以将分区置于$ k $诱导的扩张器中，每个分区都以$ε$为界。 A recent line of work initiated by Czumaj, Peng and Sohler [STOC'15] has shown how to test whether a graph is close to $(k, ε)$-clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate $\approx ε$, but no sublinear time algorithms for learning the structure of inter-cluster connections are known.作为一个简单的例子，一个人可以在局部区分形成线路和集团的“群集图”？在本文中，我们考虑了测试$（k，ε）$ - 可簇时间簇的层次集群结构的问题。我们衡量层次凝聚力的度量是良好的Dasgupta成本，我们的主要结果是一种算法，该算法使用少数随机选择的种子Vertices近似于$（k，ε）$ - 可簇图的Dasgupta成本（k，ε）$ - 可簇图。 Our main result is an $O(\sqrt{\log k})$ approximation to Dasgupta cost of $G$ in $\approx n^{1/2+O(ε)}$ time using $\approx n^{1/3}$ seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis [SODA'17] on clusterable图。据我们所知，我们的第一个结果是在均方根时间近似此类图的分层聚类属性。

Testing graph cluster structure has been a central object of study in property testing since the foundational work of Goldreich and Ron [STOC'96] on expansion testing, i.e. the problem of distinguishing between a single cluster (an expander) and a graph that is far from a single cluster. More generally, a $(k, ε)$-clusterable graph $G$ is a graph whose vertex set admits a partition into $k$ induced expanders, each with outer conductance bounded by $ε$. A recent line of work initiated by Czumaj, Peng and Sohler [STOC'15] has shown how to test whether a graph is close to $(k, ε)$-clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate $\approx ε$, but no sublinear time algorithms for learning the structure of inter-cluster connections are known. As a simple example, can one locally distinguish between the `cluster graph' forming a line and a clique? In this paper, we consider the problem of testing the hierarchical cluster structure of $(k, ε)$-clusterable graphs in sublinear time. Our measure of hierarchical clusterability is the well-established Dasgupta cost, and our main result is an algorithm that approximates Dasgupta cost of a $(k, ε)$-clusterable graph in sublinear time, using a small number of randomly chosen seed vertices for which cluster labels are known. Our main result is an $O(\sqrt{\log k})$ approximation to Dasgupta cost of $G$ in $\approx n^{1/2+O(ε)}$ time using $\approx n^{1/3}$ seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis [SODA'17] on clusterable graphs. To the best of our knowledge, ours is the first result on approximating the hierarchical clustering properties of such graphs in sublinear time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题