利用相关性以实现低级偏好土匪的更快学习率

论文标题

利用相关性以实现低级偏好土匪的更快学习率

Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits

论文作者

Ghoshal, Suprovat, Saha, Aadirupa

论文摘要

我们介绍了基于随机效用的选择模型（RUMS）的\ emph {相关的优先匪徒}问题，其中的目标是通过在线subsetwise subsetwise偏好反馈中从给定的$ n $项目中识别出最佳项目。我们研究了具有简单相关结构的模型，例如排名低，会导致更快的学习率。虽然我们表明问题无法解决一般的“低等级”选择模型，但假设更快的学习率可以实现更快的学习率。特别是，我们引入了基于\ emph {block-rank}的新类别朗姆酒模型，其中最好的项目显示为$（ε，δ）$ - PAC可学习，仅使用$ o（rε^{ - 2} \ log（n/δ））$样本。这改善了$ \ tilde {o}（nε^{ - 2} \ log（1/δ））$的标准样本复杂性结合，以通常无法利用项目相关的通常学习算法（$ r \ ll n $）而闻名。我们将上述样品复杂性与匹配的下限（最多达到对数因素）进行补充，证明了我们的分析的紧密度。出乎意料的是，当学习者被迫只演奏对决而不是较大的子集子查询时，我们还显示了$ω（Nε^{ - 2} \ log（1/δ））$的下限。此外，我们将结果扩展到更一般的``\ emph {嘈杂块}'模型，从而确保了我们的技术的鲁棒性。总体而言，我们的结果证明了播放子集中查询比成对偏好$（k = 2）$的优势是合理的，我们表明后者未能利用相关性。

We introduce the \emph{Correlated Preference Bandits} problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of $n$ items through online subsetwise preference feedback. We investigate whether models with a simple correlation structure, e.g. low rank, can result in faster learning rates. While we show that the problem can be impossible to solve for the general `low rank' choice models, faster learning rates can be attained assuming more structured item correlations. In particular, we introduce a new class of \emph{Block-Rank} based RUM model, where the best item is shown to be $(ε,δ)$-PAC learnable with only $O(r ε^{-2} \log(n/δ))$ samples. This improves on the standard sample complexity bound of $\tilde{O}(nε^{-2} \log(1/δ))$ known for the usual learning algorithms which might not exploit the item-correlations ($r \ll n$). We complement the above sample complexity with a matching lower bound (up to logarithmic factors), justifying the tightness of our analysis. Surprisingly, we also show a lower bound of $Ω(nε^{-2}\log(1/δ))$ when the learner is forced to play just duels instead of larger subsetwise queries. Further, we extend the results to a more general `\emph{noisy Block-Rank}' model, which ensures robustness of our techniques. Overall, our results justify the advantage of playing subsetwise queries over pairwise preferences $(k=2)$, we show the latter provably fails to exploit correlation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题