论文标题
从几个随机种子中近似sublenear时间的dasgupta成本
Approximating Dasgupta Cost in Sublinear Time from a Few Random Seeds
论文作者
论文摘要
自Goldreich和Ron [stoc'96]在扩展测试中的基础工作以来,测试图群集结构一直是财产测试研究的中心对象,即区分单个群集(一个扩展器)和图形远非单个群集的问题。更一般而言,$(k,ε)$ - 可簇图$ g $是一个图形,其顶点套件可以将分区置于$ k $诱导的扩张器中,每个分区都以$ε$为界。 A recent line of work initiated by Czumaj, Peng and Sohler [STOC'15] has shown how to test whether a graph is close to $(k, ε)$-clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate $\approx ε$, but no sublinear time algorithms for learning the structure of inter-cluster connections are known.作为一个简单的例子,一个人可以在局部区分形成线路和集团的“群集图”? 在本文中,我们考虑了测试$(k,ε)$ - 可簇时间簇的层次集群结构的问题。我们衡量层次凝聚力的度量是良好的Dasgupta成本,我们的主要结果是一种算法,该算法使用少数随机选择的种子Vertices近似于$(k,ε)$ - 可簇图的Dasgupta成本(k,ε)$ - 可簇图。 Our main result is an $O(\sqrt{\log k})$ approximation to Dasgupta cost of $G$ in $\approx n^{1/2+O(ε)}$ time using $\approx n^{1/3}$ seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis [SODA'17] on clusterable图。据我们所知,我们的第一个结果是在均方根时间近似此类图的分层聚类属性。
Testing graph cluster structure has been a central object of study in property testing since the foundational work of Goldreich and Ron [STOC'96] on expansion testing, i.e. the problem of distinguishing between a single cluster (an expander) and a graph that is far from a single cluster. More generally, a $(k, ε)$-clusterable graph $G$ is a graph whose vertex set admits a partition into $k$ induced expanders, each with outer conductance bounded by $ε$. A recent line of work initiated by Czumaj, Peng and Sohler [STOC'15] has shown how to test whether a graph is close to $(k, ε)$-clusterable, and to locally determine which cluster a given vertex belongs to with misclassification rate $\approx ε$, but no sublinear time algorithms for learning the structure of inter-cluster connections are known. As a simple example, can one locally distinguish between the `cluster graph' forming a line and a clique? In this paper, we consider the problem of testing the hierarchical cluster structure of $(k, ε)$-clusterable graphs in sublinear time. Our measure of hierarchical clusterability is the well-established Dasgupta cost, and our main result is an algorithm that approximates Dasgupta cost of a $(k, ε)$-clusterable graph in sublinear time, using a small number of randomly chosen seed vertices for which cluster labels are known. Our main result is an $O(\sqrt{\log k})$ approximation to Dasgupta cost of $G$ in $\approx n^{1/2+O(ε)}$ time using $\approx n^{1/3}$ seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis [SODA'17] on clusterable graphs. To the best of our knowledge, ours is the first result on approximating the hierarchical clustering properties of such graphs in sublinear time.