论文标题

低级矩阵近似的监督分位数归一化

Supervised Quantile Normalization for Low-rank Matrix Approximation

论文作者

Cuturi, Marco, Teboul, Olivier, Niles-Weed, Jonathan, Vert, Jean-Philippe

论文摘要

低等级矩阵分解是机器学习中的一个基本构件,例如用于总结基因表达概况数据或单词文档计数。为了使异常值和跨特征的比例差异保持稳定,矩阵分解步骤通常在临时特征归一化步骤之前,例如\ texttt {tf-idf}缩放或数据美白。我们建议在这项工作中与分解本身共同学习这些归一化操作员。更确切地说,考虑到$ n $ d $ d $ d $ d $特征的$ d \ times n $ x $ x $ x $ x $ x $ d $ n $ n $ demution,我们建议学习量化标准化运算符的参数,这些参数可以根据$ x $和/或其因素化$ uv $的值进行行动,以提高$ x $ $ x $ $ x $的质量的质量。通过引入了使用最佳运输构建的新的可区分分位数正常化运算符,可以促进这种优化,从而在现有工作(Cuturi et al.2019)的现有工作中提供了新的结果。我们证明了这些技术对合成和基因组学数据集的适用性。

Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as \texttt{tf-idf} scaling or data whitening. We propose in this work to learn these normalization operators jointly with the factorization itself. More precisely, given a $d\times n$ matrix $X$ of $d$ features measured on $n$ individuals, we propose to learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. This optimization is facilitated by the introduction of a new differentiable quantile normalization operator built using optimal transport, providing new results on top of existing work by (Cuturi et al. 2019). We demonstrate the applicability of these techniques on synthetic and genomics datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源