论文标题
多代理低维线性匪徒
Multi-Agent Low-Dimensional Linear Bandits
论文作者
论文摘要
我们研究了一个具有侧面信息的多代理随机线性匪徒,由未知向量$θ^* \参数化。侧面信息由有限的低维子空间组成,其中一个包含$θ^*$。在我们的环境中,代理可以通过在连接它们的通信图上发送建议来减少遗憾。我们提出了一种新颖的分散算法,其中代理彼此传达子空间指数,并且每个代理在相应(低维)子空间上播放linucb的预测变体。通过在相应的低维子空间中分发跨用户搜索最佳子空间的搜索,并通过每个代理学习未知向量的搜索,我们表明,每个代理的有限时间遗憾要比代理商不交流时要小得多。我们最终通过模拟补充了这些结果。
We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown vector $θ^* \in \mathbb{R}^d$. The side information consists of a finite collection of low-dimensional subspaces, one of which contains $θ^*$. In our setting, agents can collaborate to reduce regret by sending recommendations across a communication graph connecting them. We present a novel decentralized algorithm, where agents communicate subspace indices with each other and each agent plays a projected variant of LinUCB on the corresponding (low-dimensional) subspace. By distributing the search for the optimal subspace across users and learning of the unknown vector by each agent in the corresponding low-dimensional subspace, we show that the per-agent finite-time regret is much smaller than the case when agents do not communicate. We finally complement these results through simulations.