论文标题
帕累托遗憾的分析多臂多武器匪徒
Pareto Regret Analyses in Multi-objective Multi-armed Bandit
论文作者
论文摘要
我们通过提供对抗性的多目标多军匪徒的表述,并定义可以应用于随机和对抗性设置的对抗性多武器多臂匪徒,从而研究多目标多臂强盗的帕累托最优性。遗憾并不依赖于任何标量功能,而是反映了与标量的遗憾相比的帕累托最优性。我们还提出了新算法,假设有和没有多目标多臂强盗设置的事先信息。该算法在对抗性设置中显示出最佳状态,并且几乎最佳地显示了随机设置中的对数因素,同时通过我们已建立的上限和帕累托遗憾的下限。此外,下界的分析表明,新的遗憾与随机设置的现有帕累托遗憾一致,并将对抗性攻击机制从强盗延伸到多目标。
We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings. The regrets do not rely on any scalarization functions and reflect Pareto optimality compared to scalarized regrets. We also present new algorithms assuming both with and without prior information of the multi-objective multi-armed bandit setting. The algorithms are shown optimal in adversarial settings and nearly optimal up to a logarithmic factor in stochastic settings simultaneously by our established upper bounds and lower bounds on Pareto regrets. Moreover, the lower bound analyses show that the new regrets are consistent with the existing Pareto regret for stochastic settings and extend an adversarial attack mechanism from bandit to the multi-objective one.