在分布式上下文的线性匪徒中学习而无需分享上下文

论文标题

在分布式上下文的线性匪徒中学习而无需分享上下文

Learning in Distributed Contextual Linear Bandits Without Sharing the Context

论文作者

Hanna, Osama A., Yang, Lin F., Fragouli, Christina

论文摘要

上下文线性匪徒是具有许多实际应用的丰富且理论上重要的模型。最近，这种设置对无线的应用程序引起了很多兴趣，在无线上，通信限制可能是性能瓶颈，尤其是当上下文来自大型$ d $维空间时。在本文中，我们考虑了一个分布式的无记忆上下文线性匪徒学习问题，在该问题中，观察上下文并采取行动的代理人在地理上与学习学习同时看不到上下文的学习者分开。我们假设上下文是从分布中生成的，并提出了一种方法，该方法对于未知上下文分布的情况使用$ \ \ \ 5D $位，如果知道上下文分布，则每上下文$ 0 $ bits $ 0 $位，同时实现了几乎相同的遗憾，就像可以直接观察到上下文。前者的界限通过$ \ log（t）$因素在现有界限上提高，其中$ t $是地平线的长度，而后者则达到了信息理论紧密度。

Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large $d$-dimensional space. In this paper, we consider a distributed memoryless contextual linear bandit learning problem, where the agents who observe the contexts and take actions are geographically separated from the learner who performs the learning while not seeing the contexts. We assume that contexts are generated from a distribution and propose a method that uses $\approx 5d$ bits per context for the case of unknown context distribution and $0$ bits per context if the context distribution is known, while achieving nearly the same regret bound as if the contexts were directly observable. The former bound improves upon existing bounds by a $\log(T)$ factor, where $T$ is the length of the horizon, while the latter achieves information theoretical tightness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题