两个有效且有益的负抽样分布的故事

论文标题

两个有效且有益的负抽样分布的故事

A Tale of Two Efficient and Informative Negative Sampling Distributions

论文作者

Daghaghi, Shabnam, Medini, Tharun, Meisburger, Nicholas, Chen, Beidi, Zhao, Mengnan, Shrivastava, Anshumali

论文摘要

在许多应用程序中，自然而然地发生了大量类别的软性分类器，例如自然语言处理和信息检索。从计算和能量的角度来看，完整软智能的计算是昂贵的。已经采用了各种抽样方法来克服这一挑战，通常称为负抽样（NS）。理想情况下，NS应从分布中采样负面的类别，该分布取决于输入数据，当前参数和正确的正类别。不幸的是，由于动态更新的参数和数据示例，没有证明是自适应的采样方案，并有效地对负类别进行了样本。因此，采用替代启发式方法，例如随机采样，基于静态频率的采样或基于学习的偏见采样，这些采样主要交易采样成本或通过迭代进行样本的适应性。在本文中，我们展示了两类的分布，其中采样方案是真正的自适应，并在接近恒定的时间内产生负样本。与对强大的NVIDIA V100 GPU的其他流行负面采样方法相比，在壁通路的时间和准确性方面，我们在CPU上的C ++实现都非常出色。

Softmax classifiers with a very large number of classes naturally occur in many applications such as natural language processing and information retrieval. The calculation of full softmax is costly from the computational and energy perspective. There have been various sampling approaches to overcome this challenge, popularly known as negative sampling (NS). Ideally, NS should sample negative classes from a distribution that is dependent on the input data, the current parameters, and the correct positive class. Unfortunately, due to the dynamically updated parameters and data samples, there is no sampling scheme that is provably adaptive and samples the negative classes efficiently. Therefore, alternative heuristics like random sampling, static frequency-based sampling, or learning-based biased sampling, which primarily trade either the sampling cost or the adaptivity of samples per iteration are adopted. In this paper, we show two classes of distributions where the sampling scheme is truly adaptive and provably generates negative samples in near-constant time. Our implementation in C++ on CPU is significantly superior, both in terms of wall-clock time and accuracy, compared to the most optimized TensorFlow implementations of other popular negative sampling approaches on powerful NVIDIA V100 GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题