多代理低维线性匪徒

论文标题

多代理低维线性匪徒

Multi-Agent Low-Dimensional Linear Bandits

论文作者

Chawla, Ronshee, Sankararaman, Abishek, Shakkottai, Sanjay

论文摘要

我们研究了一个具有侧面信息的多代理随机线性匪徒，由未知向量$θ^* \参数化。侧面信息由有限的低维子空间组成，其中一个包含$θ^*$。在我们的环境中，代理可以通过在连接它们的通信图上发送建议来减少遗憾。我们提出了一种新颖的分散算法，其中代理彼此传达子空间指数，并且每个代理在相应（低维）子空间上播放linucb的预测变体。通过在相应的低维子空间中分发跨用户搜索最佳子空间的搜索，并通过每个代理学习未知向量的搜索，我们表明，每个代理的有限时间遗憾要比代理商不交流时要小得多。我们最终通过模拟补充了这些结果。

We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown vector $θ^* \in \mathbb{R}^d$. The side information consists of a finite collection of low-dimensional subspaces, one of which contains $θ^*$. In our setting, agents can collaborate to reduce regret by sending recommendations across a communication graph connecting them. We present a novel decentralized algorithm, where agents communicate subspace indices with each other and each agent plays a projected variant of LinUCB on the corresponding (low-dimensional) subspace. By distributing the search for the optimal subspace across users and learning of the unknown vector by each agent in the corresponding low-dimensional subspace, we show that the per-agent finite-time regret is much smaller than the case when agents do not communicate. We finally complement these results through simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题