对决匪徒的组合纯探索

论文标题

对决匪徒的组合纯探索

Combinatorial Pure Exploration of Dueling Bandit

论文作者

Chen, Wei, Du, Yihan, Huang, Longbo, Zhao, Haoyu

论文摘要

在本文中，我们研究了针对决斗匪徒的组合纯探索（CPE-DB）：我们有多个候选人的多个候选人，这些位置是由两部分图进行建模的，在每一轮比赛中，我们在一个位置上进行了两次候选人的决斗，并观察他们在决斗中赢得的胜利，目的是与最佳候选位置相匹配，以找到最佳的候选位置，以找到最佳的候选位置。 CPE-DB是针对决斗匪徒的原始组合纯勘探（CPE-MAB）问题的原始组合纯勘探的改编。我们考虑了Borda的获胜者和Condorcet冠军案件。对于Borda获奖者，我们将问题降低到原始的CPE-MAB设置和设计PAC以及确切的算法，这些算法既可以实现与CPE-MAB设置相似的样本复杂性（对于问题的子类）和每回合的多项式运行时间。对于Condorcet获奖者，我们首先设计了一个完全多项式的时间近似方案（FPTA），以解决具有已知获胜概率的Condorcet获奖者的离线问题，然后将FPTA用作甲骨文，以设计新颖的纯探索eLgorithm $ {\ sf car} $ - $ - $ - $ - $ $ {\ sf cond consplity consplity consplity分析。 $ {\ sf car} $ - $ {\ sf cond} $是第一个每轮多项式运行时间的算法，用于识别CPE-DB中的Condorcet获胜者。

In this paper, we study combinatorial pure exploration for dueling bandits (CPE-DB): we have multiple candidates for multiple positions as modeled by a bipartite graph, and in each round we sample a duel of two candidates on one position and observe who wins in the duel, with the goal of finding the best candidate-position matching with high probability after multiple rounds of samples. CPE-DB is an adaptation of the original combinatorial pure exploration for multi-armed bandit (CPE-MAB) problem to the dueling bandit setting. We consider both the Borda winner and the Condorcet winner cases. For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round. For Condorcet winner, we first design a fully polynomial time approximation scheme (FPTAS) for the offline problem of finding the Condorcet winner with known winning probabilities, and then use the FPTAS as an oracle to design a novel pure exploration algorithm ${\sf CAR}$-${\sf Cond}$ with sample complexity analysis. ${\sf CAR}$-${\sf Cond}$ is the first algorithm with polynomial running time per round for identifying the Condorcet winner in CPE-DB.

下载PDF全文

下载文献需遵守相关版权规定

论文标题