从成对边际恢复离散随机变量的关节概率

论文标题

从成对边际恢复离散随机变量的关节概率

Recovering Joint Probability of Discrete Random Variables from Pairwise Marginals

论文作者

Ibrahim, Shahana, Fu, Xiao

论文摘要

学习随机变量（RVS）的联合概率是统计信号处理和机器学习的基石。但是，由于维度的诅咒，通常不可能对高维关节概率进行直接非参数估计。最近的工作提出了从三维边缘的任意数量的RV回收联合概率质量函数（PMF），利用低级数张量分解的代数特性和RV之间的（未知）依赖性。尽管如此，就样本复杂性而言，准确估算三维边际的估计仍然可能是昂贵的，从而影响了样本饥饿的制度在实践中的这一工作的表现。使用三维边缘还涉及挑战张量分解问题，这些问题尚不清楚。这项工作为仅使用成对边际学习关节PMF的新框架提供了一个新的框架，该框架自然地相对于三阶的样本复杂性较低。开发了一个耦合的非负矩阵分解（CNMF）框架，并在各种条件下分析了其关节PMF回收保证。我们的方法还具有革兰氏阴性（GS）类似于竞争性运行时性能的算法。在合理条件下，该算法可证明可以在有限迭代中恢复关节PMF，直至有限的误差。还表明，最近提出的经济期望最大化（EM）算法可以确保改善类似GS的算法的输出，从而进一步提高准确性和效率。使用Real-DATA实验来展示有效性。

Learning the joint probability of random variables (RVs) is the cornerstone of statistical signal processing and machine learning. However, direct nonparametric estimation for high-dimensional joint probability is in general impossible, due to the curse of dimensionality. Recent work has proposed to recover the joint probability mass function (PMF) of an arbitrary number of RVs from three-dimensional marginals, leveraging the algebraic properties of low-rank tensor decomposition and the (unknown) dependence among the RVs. Nonetheless, accurately estimating three-dimensional marginals can still be costly in terms of sample complexity, affecting the performance of this line of work in practice in the sample-starved regime. Using three-dimensional marginals also involves challenging tensor decomposition problems whose tractability is unclear. This work puts forth a new framework for learning the joint PMF using only pairwise marginals, which naturally enjoys a lower sample complexity relative to the third-order ones. A coupled nonnegative matrix factorization (CNMF) framework is developed, and its joint PMF recovery guarantees under various conditions are analyzed. Our method also features a Gram--Schmidt (GS)-like algorithm that exhibits competitive runtime performance. The algorithm is shown to provably recover the joint PMF up to bounded error in finite iterations, under reasonable conditions. It is also shown that a recently proposed economical expectation maximization (EM) algorithm guarantees to improve upon the GS-like algorithm's output, thereby further lifting up the accuracy and efficiency. Real-data experiments are employed to showcase the effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题