论文标题
快速加载的骰子辊:一个近距离的精确采样器,用于离散概率分布
The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
论文作者
论文摘要
本文介绍了一种新算法,该算法是针对使用独立和无偏的随机硬币翻转源来从离散概率分布中产生随机整数的基本问题的。我们证明,该算法称为快速加载的骰子辊(FLDR),在时空和时间上都高效:(i)采样器的大小保证在编码输入分布所需的位数中是线性的; (ii)每个样品消耗的熵的预期数量最多比理论上的最佳速率高6位。我们使用未签名的整数算术介绍了线性时间预处理和近距离采样算法的快速实现。对一组广泛概率分布的经验评估表明,在预处理和采样中,FLDR比多个基线算法(包括广泛使用的别名和间隔采样器)快2x-10x。它还比理论上最佳采样器的信息还要少10000倍,但牺牲了低于1.5倍的运行时开销。
This paper introduces a new algorithm for the fundamental problem of generating a random integer from a discrete probability distribution using a source of independent and unbiased random coin flips. We prove that this algorithm, which we call the Fast Loaded Dice Roller (FLDR), is highly efficient in both space and time: (i) the size of the sampler is guaranteed to be linear in the number of bits needed to encode the input distribution; and (ii) the expected number of bits of entropy it consumes per sample is at most 6 bits more than the information-theoretically optimal rate. We present fast implementations of the linear-time preprocessing and near-optimal sampling algorithms using unsigned integer arithmetic. Empirical evaluations on a broad set of probability distributions establish that FLDR is 2x-10x faster in both preprocessing and sampling than multiple baseline algorithms, including the widely-used alias and interval samplers. It also uses up to 10000x less space than the information-theoretically optimal sampler, at the expense of less than 1.5x runtime overhead.