用于对称张量分解的流核

论文标题

用于对称张量分解的流核

Streaming Coresets for Symmetric Tensor Factorization

论文作者

Chhaya, Rachit, Choudhari, Jayesh, Dasgupta, Anirban, Shit, Supratim

论文摘要

分解张量最近已成为许多机器学习管道中的重要优化模块，尤其是在潜在变量模型中。我们在流设置中展示了如何有效地执行此操作。 Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the $p$-moment tensor of the coreset approximates the corresponding decomposition of the $p$-moment tensor computed from the full data.我们介绍了两种新颖的算法技术：在线过滤和内核化。使用这两个，我们提出了六种算法，这些算法实现了核心大小，更新时间和工作空间的不同权衡，击败或匹配了各种最先进的算法。对于矩阵（$ 2 $订购的张量），我们的在线行采样算法保证$（1 \ pmε）$相对误差频谱近似。我们在学习单个主题建模中显示了算法的应用。

Factorizing tensors has recently become an important optimization module in a number of machine learning pipelines, especially in latent variable models. We show how to do this efficiently in the streaming setting. Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the $p$-moment tensor of the coreset approximates the corresponding decomposition of the $p$-moment tensor computed from the full data. We introduce two novel algorithmic techniques: online filtering and kernelization. Using these two, we present six algorithms that achieve different tradeoffs of coreset size, update time and working space, beating or matching various state of the art algorithms. In the case of matrices ($2$-ordered tensor), our online row sampling algorithm guarantees $(1 \pm ε)$ relative error spectral approximation. We show applications of our algorithms in learning single topic modeling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题