论文标题

$ k $ cliques的均等时间采样的几乎最佳范围:采样集团比计数难得难

Almost Optimal Bounds for Sublinear-Time Sampling of $k$-Cliques: Sampling Cliques is Harder Than Counting

论文作者

Eden, Talya, Ron, Dana, Rosenbaum, Will

论文摘要

在这项工作中,我们考虑了从sublinear时间中几乎均匀分布中的图表中在一般图形查询模型中的几乎均匀分布中采样的问题。具体来说,该算法应输出每个$ k $ -clique,其中概率$(1 \pmε)/n_k $,其中$ n_k $表示图中的$ k $ -cliques的数量,而$ε$是给定的近似参数。 我们证明,此问题的查询复杂性是\ [θ^*\ left(\ max \ left \ {\ left(\ frac {(nα)^{k/2}}} {n_k} {n_k} \ right) \ min \ left \ {nα,\ frac {nα^{k-1}} {n_k} \ right \} \ right \} \ right \} \ right)。 \]其中$ n $是图表中的顶点数,$α$是其营养性,$θ^*$抑制了对$(\ log n/ε)^{o(k)} $的依赖性。有趣的是,这建立了近似计数和均匀度统一采样之间的分离。例如,如果$ k = 3 $,$α= o(1)$,而$ n_3 $(三角形的数量)为$θ(n)$,那么我们得到了$ω(n^{1/4})$的下限(n^{1/4})$(对于常数$ε$),而在这些情况下,$(1 \ pm pm pm pmpimpmpimpmpimpmimation of-y 33 $) $ \ textrm {poly}(\ log(n/ε))$ QUERIES(EDEN,RON和SESHADHRI,SODA20)。 我们的下边界是由建造具有ARBORICITY $α$的图形家庭的结构,使得在每个图中都有$ n_k $ cliques(尺寸$ k $),其中其中一个被“隐藏”,因此很难采样。我们的上限是基于定义特殊的辅助图$ H_K $,以便在$ H_K $中几乎均匀地采样边缘,转化为在原始图$ G $中几乎均匀地采样$ k $ -cliques。然后,我们以已知的边缘采样算法(Eden,Ron和Rosenbaum,ICALP19)为基础,以$ H_K $采样边缘,在$ H_K $中,挑战是将查询模拟为$ H_K $,而仅给出$ G $。

In this work, we consider the problem of sampling a $k$-clique in a graph from an almost uniform distribution in sublinear time in the general graph query model. Specifically the algorithm should output each $k$-clique with probability $(1\pm ε)/n_k$, where $n_k$ denotes the number of $k$-cliques in the graph and $ε$ is a given approximation parameter. We prove that the query complexity of this problem is \[ Θ^*\left(\max\left\{ \left(\frac{(nα)^{k/2}}{ n_k}\right)^{\frac{1}{k-1}} ,\; \min\left\{nα,\frac{nα^{k-1}}{n_k} \right\}\right\}\right). \] where $n$ is the number of vertices in the graph, $α$ is its arboricity, and $Θ^*$ suppresses the dependence on $(\log n/ε)^{O(k)}$. Interestingly, this establishes a separation between approximate counting and approximate uniform sampling in the sublinear regime. For example, if $k=3$, $α= O(1)$, and $n_3$ (the number of triangles) is $Θ(n)$, then we get a lower bound of $Ω(n^{1/4})$ (for constant $ε$), while under these conditions, a $(1\pm ε)$-approximation of $n_3$ can be obtained by performing $\textrm{poly}(\log(n/ε))$ queries (Eden, Ron and Seshadhri, SODA20). Our lower bound follows from a construction of a family of graphs with arboricity $α$ such that in each graph there are $n_k$ cliques (of size $k$), where one of these cliques is "hidden" and hence hard to sample. Our upper bound is based on defining a special auxiliary graph $H_k$, such that sampling edges almost uniformly in $H_k$ translates to sampling $k$-cliques almost uniformly in the original graph $G$. We then build on a known edge-sampling algorithm (Eden, Ron and Rosenbaum, ICALP19) to sample edges in $H_k$, where the challenge is simulate queries to $H_k$ while being given access only to $G$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源