论文标题
Wasserstein分布学习
Wasserstein Distributional Learning
论文作者
论文摘要
学习条件密度和识别影响整个分布的因素是数据驱动应用程序中的重要任务。常规方法主要与摘要统计数据合作,因此不足以进行全面的调查。最近,关于功能回归方法的发展,将密度曲线作为功能结果建模。开发此类模型的一个主要挑战在于非阴性的固有约束和密度结果功能空间的单位积分。为了克服这个基本问题,我们建议Wasserstein分销学习(WDL),这是一种柔性在降低回归建模框架上,始于Wasserstein距离$ W_2 $,作为密度结果空间的适当指标。然后,我们引入了一种异质和灵活的半参数条件高斯混合模型(SCGMM),作为模型类$ \ mathfrak {f} \ otimes \ Mathcal {t} $。生成的度量空间$(\ Mathfrak {f} \ otimes \ Mathcal {t},W_2)$满足所需的约束,并提供一个密集且封闭的功能子空间。为了拟合所提出的模型,我们基于增强树的大量最小化优化,进一步开发了一种有效的算法。与以前的文献中的方法相比,WDL更好地表征并发现条件密度的非线性依赖性及其得出的摘要统计。我们通过模拟和现实世界应用来证明WDL框架的有效性。
Learning conditional densities and identifying factors that influence the entire distribution are vital tasks in data-driven applications. Conventional approaches work mostly with summary statistics, and are hence inadequate for a comprehensive investigation. Recently, there have been developments on functional regression methods to model density curves as functional outcomes. A major challenge for developing such models lies in the inherent constraint of non-negativity and unit integral for the functional space of density outcomes. To overcome this fundamental issue, we propose Wasserstein Distributional Learning (WDL), a flexible density-on-scalar regression modeling framework that starts with the Wasserstein distance $W_2$ as a proper metric for the space of density outcomes. We then introduce a heterogeneous and flexible class of Semi-parametric Conditional Gaussian Mixture Models (SCGMM) as the model class $\mathfrak{F} \otimes \mathcal{T}$. The resulting metric space $(\mathfrak{F} \otimes \mathcal{T}, W_2)$ satisfies the required constraints and offers a dense and closed functional subspace. For fitting the proposed model, we further develop an efficient algorithm based on Majorization-Minimization optimization with boosted trees. Compared with methods in the previous literature, WDL better characterizes and uncovers the nonlinear dependence of the conditional densities, and their derived summary statistics. We demonstrate the effectiveness of the WDL framework through simulations and real-world applications.