论文标题
重击估计的查询复杂性
Query complexity of heavy hitter estimation
论文作者
论文摘要
我们考虑确定元素的子集$ \ MATHCAL {s}^γ_{\ Mathcal {p}} $,以支持基础分布$ \ Mathcal {p} $,其概率值的概率值大于给定的threshold $γ$,从而使eRacle询问ocar $ ___________________ $ i.i.d. $样本从$ \ Mathcal {p} $中绘制。我们考虑两个查询型号:$(a)$每个查询是索引$ i $,oracle返回值$ x_i $和$(b)$每个查询是一对$(i,j)$,而Oracle如果$ x_i = x_j $或否。对于这些查询模型,我们设计顺序估计算法,在每回合中,要么根据整个响应的历史来决定要发送哪种查询到甲骨文,要么决定停止并输出$ \ MATHCAL {s}^γ_{\ Mathcal {\ Mathcal {p}} $正确的大小概率的估计值,以确定了一些大小的概率。我们为任何分布$ \ Mathcal {p} $的算法的查询复杂性提供了上限,并且在两个查询模型下的最佳查询复杂性上也得出了下限。我们还考虑了两个查询模型的嘈杂版本,并提出了可靠的估计器,这些估计器可以有效地抵消Oracle响应中的噪声。
We consider the problem of identifying the subset $\mathcal{S}^γ_{\mathcal{P}}$ of elements in the support of an underlying distribution $\mathcal{P}$ whose probability value is larger than a given threshold $γ$, by actively querying an oracle to gain information about a sequence $X_1, X_2, \ldots$ of $i.i.d.$ samples drawn from $\mathcal{P}$. We consider two query models: $(a)$ each query is an index $i$ and the oracle return the value $X_i$ and $(b)$ each query is a pair $(i,j)$ and the oracle gives a binary answer confirming if $X_i = X_j$ or not. For each of these query models, we design sequential estimation algorithms which at each round, either decide what query to send to the oracle depending on the entire history of responses or decide to stop and output an estimate of $\mathcal{S}^γ_{\mathcal{P}}$, which is required to be correct with some pre-specified large probability. We provide upper bounds on the query complexity of the algorithms for any distribution $\mathcal{P}$ and also derive lower bounds on the optimal query complexity under the two query models. We also consider noisy versions of the two query models and propose robust estimators which can effectively counter the noise in the oracle responses.