论文标题

KNN插入:局部平滑的嵌入混合物,用于多息候选

kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval

论文作者

El-Kishky, Ahmed, Markovich, Thomas, Leung, Kenny, Portman, Frank, Haghighi, Aria, Xiao, Ying

论文摘要

候选检索是推荐系统中的第一阶段,其中使用轻型系统来检索输入用户的潜在相关项目。然后,使用更复杂的排名模型对这些候选项目进行排名和修剪。作为建议漏斗的首要任务,重要的是要检索设定为下游排名模型的高核候选者。一种常见的方法是从单个密集查询嵌入中利用大约最近的邻居(ANN)搜索;但是,这种方法可以产生一个低多样性结果,并带有许多近重复。由于用户经常有多种兴趣,因此理想情况下应返回一组反映用户多重兴趣的候选人。为此,我们介绍了KNN EMBED,这是一种改善基于ANN的检索多样性的一般方法。 KNN EMBED代表每个用户作为所学的物品簇的平滑混合物,代表用户的不同“兴趣”。通过查询每个用户的混合物组件与其混合物的比例成比例,我们可以检索一组高多样性的候选者,这些候选者反映了每个用户的兴趣中的元素。我们在实验上比较了KNN与标准的ANN候选检索,并在三个数据集的整体召回率上显示出显着改善和改善的多样性。伴随着这项工作,我们开源了一个大型Twitter跟随数据集(https://huggingface.co/datasets/twitter/twitter/twitterfollowgraph),以刺激针对建议系统的图形和表示方面的进一步研究。

Candidate retrieval is the first stage in recommendation systems, where a light-weight system is used to retrieve potentially relevant items for an input user. These candidate items are then ranked and pruned in later stages of recommender systems using a more complex ranking model. As the top of the recommendation funnel, it is important to retrieve a high-recall candidate set to feed into downstream ranking models. A common approach is to leverage approximate nearest neighbor (ANN) search from a single dense query embedding; however, this approach this can yield a low-diversity result set with many near duplicates. As users often have multiple interests, candidate retrieval should ideally return a diverse set of candidates reflective of the user's multiple interests. To this end, we introduce kNN-Embed, a general approach to improving diversity in dense ANN-based retrieval. kNN-Embed represents each user as a smoothed mixture over learned item clusters that represent distinct "interests" of the user. By querying each of a user's mixture component in proportion to their mixture weights, we retrieve a high-diversity set of candidates reflecting elements from each of a user's interests. We experimentally compare kNN-Embed to standard ANN candidate retrieval, and show significant improvements in overall recall and improved diversity across three datasets. Accompanying this work, we open source a large Twitter follow-graph dataset (https://huggingface.co/datasets/Twitter/TwitterFollowGraph), to spur further research in graph-mining and representation learning for recommender systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源