KNN插入：局部平滑的嵌入混合物，用于多息候选

论文标题

KNN插入：局部平滑的嵌入混合物，用于多息候选

kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval

论文作者

El-Kishky, Ahmed, Markovich, Thomas, Leung, Kenny, Portman, Frank, Haghighi, Aria, Xiao, Ying

论文摘要

候选检索是推荐系统中的第一阶段，其中使用轻型系统来检索输入用户的潜在相关项目。然后，使用更复杂的排名模型对这些候选项目进行排名和修剪。作为建议漏斗的首要任务，重要的是要检索设定为下游排名模型的高核候选者。一种常见的方法是从单个密集查询嵌入中利用大约最近的邻居（ANN）搜索；但是，这种方法可以产生一个低多样性结果，并带有许多近重复。由于用户经常有多种兴趣，因此理想情况下应返回一组反映用户多重兴趣的候选人。为此，我们介绍了KNN EMBED，这是一种改善基于ANN的检索多样性的一般方法。 KNN EMBED代表每个用户作为所学的物品簇的平滑混合物，代表用户的不同“兴趣”。通过查询每个用户的混合物组件与其混合物的比例成比例，我们可以检索一组高多样性的候选者，这些候选者反映了每个用户的兴趣中的元素。我们在实验上比较了KNN与标准的ANN候选检索，并在三个数据集的整体召回率上显示出显着改善和改善的多样性。伴随着这项工作，我们开源了一个大型Twitter跟随数据集（https://huggingface.co/datasets/twitter/twitter/twitterfollowgraph），以刺激针对建议系统的图形和表示方面的进一步研究。

Candidate retrieval is the first stage in recommendation systems, where a light-weight system is used to retrieve potentially relevant items for an input user. These candidate items are then ranked and pruned in later stages of recommender systems using a more complex ranking model. As the top of the recommendation funnel, it is important to retrieve a high-recall candidate set to feed into downstream ranking models. A common approach is to leverage approximate nearest neighbor (ANN) search from a single dense query embedding; however, this approach this can yield a low-diversity result set with many near duplicates. As users often have multiple interests, candidate retrieval should ideally return a diverse set of candidates reflective of the user's multiple interests. To this end, we introduce kNN-Embed, a general approach to improving diversity in dense ANN-based retrieval. kNN-Embed represents each user as a smoothed mixture over learned item clusters that represent distinct "interests" of the user. By querying each of a user's mixture component in proportion to their mixture weights, we retrieve a high-diversity set of candidates reflecting elements from each of a user's interests. We experimentally compare kNN-Embed to standard ANN candidate retrieval, and show significant improvements in overall recall and improved diversity across three datasets. Accompanying this work, we open source a large Twitter follow-graph dataset (https://huggingface.co/datasets/Twitter/TwitterFollowGraph), to spur further research in graph-mining and representation learning for recommender systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题