论文标题
可伸缩的香农熵估计器
A Scalable Shannon Entropy Estimator
论文作者
论文摘要
我们重新审视了估计概率分布的香农熵的充分研究的问题,现在可以访问概率浏览条件采样甲骨文。在此模型中,Oracle将作为输入的表示$ s $的表示,并从通过$ S $条件获得的分布中返回样本,以及该样本在分发中的概率。我们的工作是由此类算法在基于编程语言的安全性中的定量信息流分析(QIF)中的应用。在这里,信息理论数量捕获了对手所需的精力,以获取对机密信息的访问。当熵较小时,这些应用需要准确的测量。不使用条件样品的现有算法需要许多与熵相反扩展的查询,这在该制度中是不可接受的,实际上,Batu等人的下限(STOC 2002)确定仅使用采样和评估甲壳可以获得可接受的性能,因此没有算法仅使用算法。另一方面,Chakraborty等人(Sicomp 2016)在条件采样模型中的先前工作仅获得了高阶多项式查询复杂性,$ \ Mathcal {o}(\ frac {m^7}大小$ \ MATHCAL {O}(2^m)$。 我们仅使用$ \ MATHCAL {O}(\ frac {m} {ε^2} \ log \ frac {1}Δ$查询概率 - 重新连续采样甲骨文。实际上,此外,我们获得了较小的,显式的常数,并证明我们的算法比以前用于QIF中用于熵估计的先前最新方法获得了实践的重大改进。
We revisit the well-studied problem of estimating the Shannon entropy of a probability distribution, now given access to a probability-revealing conditional sampling oracle. In this model, the oracle takes as input the representation of a set $S$ and returns a sample from the distribution obtained by conditioning on $S$, together with the probability of that sample in the distribution. Our work is motivated by applications of such algorithms in Quantitative Information Flow analysis (QIF) in programming-language-based security. Here, information-theoretic quantities capture the effort required on the part of an adversary to obtain access to confidential information. These applications demand accurate measurements when the entropy is small. Existing algorithms that do not use conditional samples require a number of queries that scale inversely with the entropy, which is unacceptable in this regime, and indeed, a lower bound by Batu et al.(STOC 2002) established that no algorithm using only sampling and evaluation oracles can obtain acceptable performance. On the other hand, prior work in the conditional sampling model by Chakraborty et al.(SICOMP 2016) only obtained a high-order polynomial query complexity, $\mathcal{O}(\frac{m^7}{ε^8}\log\frac{1}δ)$ queries, to obtain additive $ε$-approximations on a domain of size $\mathcal{O}(2^m)$. We obtain multiplicative $(1+ε)$-approximations using only $\mathcal{O}(\frac{m}{ε^2}\log\frac{1}δ)$ queries to the probability-revealing conditional sampling oracle. Indeed, moreover, we obtain small, explicit constants, and demonstrate that our algorithm obtains a substantial improvement in practice over the previous state-of-the-art methods used for entropy estimation in QIF.