论文标题
基于级联模型的反事实学习的倾向估计
Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank
论文作者
论文摘要
公正的CLTR需要点击倾向,以补偿用户点击和通过IPS搜索结果的真实相关性之间的差异。当前的倾向估计方法假设用户点击行为遵循PBM,并根据此假设估算点击倾向。但是,实际上,用户点击通常遵循CM,在该CM中,用户扫描搜索从上到下以及每次点击取决于上一个。在这种级联方案中,基于PBM的倾向估计不准确,这反过来损害了CLTR的性能。在本文中,我们提出了一种称为CM-IP的级联反应方法的倾向估计方法。我们表明,如果用户点击遵循CM,而基于PBM的CLTR则在全信息方面有很大的差距,则CM-IPS保持CLTR性能接近全信息性能。如果用户单击遵循PBM而不是CM,则相反。最后,我们建议一种基于历史用户点击的基于CM-和PBM的倾向估计方法进行选择的方法。
Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS. Current propensity estimation methods assume that user click behavior follows the PBM and estimate click propensities based on this assumption. However, in reality, user clicks often follow the CM, where users scan search results from top to bottom and where each next click depends on the previous one. In this cascade scenario, PBM-based estimates of propensities are not accurate, which, in turn, hurts CLTR performance. In this paper, we propose a propensity estimation method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR performance close to the full-information performance in case the user clicks follow the CM, while PBM-based CLTR has a significant gap towards the full-information. The opposite is true if the user clicks follow PBM instead of the CM. Finally, we suggest a way to select between CM- and PBM-based propensity estimation methods based on historical user clicks.