稀疏的Hausdorff力矩问题，并应用于主题模型

论文标题

稀疏的Hausdorff力矩问题，并应用于主题模型

The Sparse Hausdorff Moment Problem, with Application to Topic Models

论文作者

Gordon, Spencer, Mazaheri, Bijan, Schulman, Leonard J., Rabani, Yuval

论文摘要

我们考虑了从第一个$ m $嘈杂的时刻识别的问题，即支持$ k <\ infty $的$ [0,1] $的概率分布。这相当于在$ m $可观察的二进制随机变量上学习分布的问题$ x_1，x_2，\ dots，x_m $，这些变量是在隐藏的随机变量$ u $上有条件的，以$ \ \ {1,2，\ dots，k \ \} $。我们的重点是用$ m = 2k $实现这一目标，这是验证源是否可能是$ k $ - mixture的最低$ m $（即使具有精确的统计信息）。因此，简而言之，这个问题非常有用：例如，通过已知的减少，任何算法都可以提升到学习纯主题模型的算法。我们使用$ m = 2k $ iid二进制随机变量的样品使用$ \ left的样本（1/w _ {\ min} \右）算术操作。这里$ w _ {\ min} $是$ u $结果的最小概率，而$ζ$是$ x_i $ s的独特成功概率之间的最小分离。根据此刻问题的说明，足以了解添加精度的矩$ w _ {\ min} \cdotζ^{o（k）} $。众所周知，任何解决识别问题解决方案的样本复杂性必须至少在$ k $中指数。先前的结果表明，对于某些$ c $的$ o（k^c）$ cultime的样本复杂性和较差的$ o运行时，$ c $的运行时间大于$ 2 $，或类似的样本复杂性，$ k^{o（k^2）} $ runtime。

We consider the problem of identifying, from its first $m$ noisy moments, a probability distribution on $[0,1]$ of support $k<\infty$. This is equivalent to the problem of learning a distribution on $m$ observable binary random variables $X_1,X_2,\dots,X_m$ that are iid conditional on a hidden random variable $U$ taking values in $\{1,2,\dots,k\}$. Our focus is on accomplishing this with $m=2k$, which is the minimum $m$ for which verifying that the source is a $k$-mixture is possible (even with exact statistics). This problem, so simply stated, is quite useful: e.g., by a known reduction, any algorithm for it lifts to an algorithm for learning pure topic models. We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables using a sample of size $\left(1/w_{\min}\right)^2 \cdot\left(1/ζ\right)^{O(k)}$ and post-sampling runtime of only $O(k^{2+o(1)})$ arithmetic operations. Here $w_{\min}$ is the minimum probability of an outcome of $U$, and $ζ$ is the minimum separation between the distinct success probabilities of the $X_i$s. Stated in terms of the moment problem, it suffices to know the moments to additive accuracy $w_{\min}\cdotζ^{O(k)}$. It is known that the sample complexity of any solution to the identification problem must be at least exponential in $k$. Previous results demonstrated either worse sample complexity and worse $O(k^c)$ runtime for some $c$ substantially larger than $2$, or similar sample complexity and much worse $k^{O(k^2)}$ runtime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题