具有多次播放的对抗匪徒设置的未知延迟

论文标题

具有多次播放的对抗匪徒设置的未知延迟

Unknown Delay for Adversarial Bandit Setting with Multiple Play

论文作者

Odeyomi, Olusola T.

论文摘要

本文解决了具有多次游戏的对抗多军匪（MAB）中未知延迟的问题。在类似游戏设置上的现有工作仅集中在学习者在每轮中选择手臂的情况。但是，在机器人技术中有很多应用程序，学习者需要每回合选择多个手臂。因此，值得研究选择多臂时延迟的效果。在这种情况下，每轮选择的多臂使它们经历了相同数量的延迟。从不同回合中选择的不同武器组合的反馈损失可能会汇总，学习者面临着将反馈损失与产生它们的武器相关联的挑战。为了解决这个问题，本文提出了多个Play（DEXP3.M）算法的延迟指数，开发和探索。遗憾的界限仅比已经为单个游戏设置的Dexp3的遗憾稍差一些。

This paper addresses the problem of unknown delays in adversarial multi-armed bandit (MAB) with multiple play. Existing work on similar game setting focused on only the case where the learner selects an arm in each round. However, there are lots of applications in robotics where a learner needs to select more than one arm per round. It is therefore worthwhile to investigate the effect of delay when multiple arms are chosen. The multiple arms chosen per round in this setting are such that they experience the same amount of delay. There can be an aggregation of feedback losses from different combinations of arms selected at different rounds, and the learner is faced with the challenge of associating the feedback losses to the arms producing them. To address this problem, this paper proposes a delayed exponential, exploitation and exploration for multiple play (DEXP3.M) algorithm. The regret bound is only slightly worse than the regret of DEXP3 already proposed for the single play setting with unknown delay.

下载PDF全文

下载文献需遵守相关版权规定

论文标题