论文标题

稀疏的Hausdorff力矩问题,并应用于主题模型

The Sparse Hausdorff Moment Problem, with Application to Topic Models

论文作者

Gordon, Spencer, Mazaheri, Bijan, Schulman, Leonard J., Rabani, Yuval

论文摘要

我们考虑了从第一个$ m $嘈杂的时刻识别的问题,即支持$ k <\ infty $的$ [0,1] $的概率分布。这相当于在$ m $可观察的二进制随机变量上学习分布的问题$ x_1,x_2,\ dots,x_m $,这些变量是在隐藏的随机变量$ u $上有条件的,以$ \ \ {1,2,\ dots,k \ \} $。我们的重点是用$ m = 2k $实现这一目标,这是验证源是否可能是$ k $ - mixture的最低$ m $(即使具有精确的统计信息)。因此,简而言之,这个问题非常有用:例如,通过已知的减少,任何算法都可以提升到学习纯主题模型的算法。 我们使用$ m = 2k $ iid二进制随机变量的样品使用$ \ left的样本(1/w _ {\ min} \右)算术操作。这里$ w _ {\ min} $是$ u $结果的最小概率,而$ζ$是$ x_i $ s的独特成功概率之间的最小分离。根据此刻问题的说明,足以了解添加精度的矩$ w _ {\ min} \cdotζ^{o(k)} $。众所周知,任何解决识别问题解决方案的样本复杂性必须至少在$ k $中指数。先前的结果表明,对于某些$ c $的$ o(k^c)$ cultime的样本复杂性和较差的$ o运行时,$ c $的运行时间大于$ 2 $,或类似的样本复杂性,$ k^{o(k^2)} $ runtime。

We consider the problem of identifying, from its first $m$ noisy moments, a probability distribution on $[0,1]$ of support $k<\infty$. This is equivalent to the problem of learning a distribution on $m$ observable binary random variables $X_1,X_2,\dots,X_m$ that are iid conditional on a hidden random variable $U$ taking values in $\{1,2,\dots,k\}$. Our focus is on accomplishing this with $m=2k$, which is the minimum $m$ for which verifying that the source is a $k$-mixture is possible (even with exact statistics). This problem, so simply stated, is quite useful: e.g., by a known reduction, any algorithm for it lifts to an algorithm for learning pure topic models. We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables using a sample of size $\left(1/w_{\min}\right)^2 \cdot\left(1/ζ\right)^{O(k)}$ and post-sampling runtime of only $O(k^{2+o(1)})$ arithmetic operations. Here $w_{\min}$ is the minimum probability of an outcome of $U$, and $ζ$ is the minimum separation between the distinct success probabilities of the $X_i$s. Stated in terms of the moment problem, it suffices to know the moments to additive accuracy $w_{\min}\cdotζ^{O(k)}$. It is known that the sample complexity of any solution to the identification problem must be at least exponential in $k$. Previous results demonstrated either worse sample complexity and worse $O(k^c)$ runtime for some $c$ substantially larger than $2$, or similar sample complexity and much worse $k^{O(k^2)}$ runtime.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源