论文标题
低级广义线性匪徒问题
Low-Rank Generalized Linear Bandit Problems
论文作者
论文摘要
在低级别的线性匪徒问题中,动作的奖励(由大小为$ d_1 \ times d_2 $表示的矩阵)是动作与未知的低率矩阵$θ^*$之间的内部产品。我们提出了一种基于在线到信心集转换的新组合〜\ citep {abbasi2012online}和指数加权的平均预报器,该算法由低级别矩阵构建。在$ t $ rounds中,我们的算法达到$ \ widetilde {o}(((d_1+d_2)^{3/2} \ sqrt {rt})$遗憾,它以标准的线性强盗对$ \ wideTilde {o}(d_1d_2 \ s $} $} $}的范围{ $ r \ ll \ min \ {d_1,d_2 \} $。我们还将算法方法扩展到广义线性设置,以获取一种算法,该算法在链接函数的规律性条件下享有相似的界限。为了围绕基于覆盖方法的计算可行性,我们通过扩展〜\ citet {2019年6月的“探索 - spass-then-refine”算法来提出有效的算法。我们有效的算法达到了$ \ widetilde {o}(((d_1+d_2)^{3/2} \ sqrt {rt})$在动作集$ \ mathcal {x} $和$ r $ - $ r $ $ $ the $ tum $ the $ the $ the $ the $ the Mathcal {x}的温和条件下遗憾。我们的上限与\ cite {2019bilinear}的猜想下限相匹配,对于低级线性匪徒问题的子类。此外,我们表明,稀疏线性匪徒问题的现有下限强烈表明我们的遗憾界限是不可解决的。为了补充我们的理论贡献,我们还进行了实验,以证明当$θ^*$是低级时,我们的算法大大胜过标准线性匪徒进近的性能。
In a low-rank linear bandit problem, the reward of an action (represented by a matrix of size $d_1 \times d_2$) is the inner product between the action and an unknown low-rank matrix $Θ^*$. We propose an algorithm based on a novel combination of online-to-confidence-set conversion~\citep{abbasi2012online} and the exponentially weighted average forecaster constructed by a covering of low-rank matrices. In $T$ rounds, our algorithm achieves $\widetilde{O}((d_1+d_2)^{3/2}\sqrt{rT})$ regret that improves upon the standard linear bandit regret bound of $\widetilde{O}(d_1d_2\sqrt{T})$ when the rank of $Θ^*$: $r \ll \min\{d_1,d_2\}$. We also extend our algorithmic approach to the generalized linear setting to get an algorithm which enjoys a similar bound under regularity conditions on the link function. To get around the computational intractability of covering based approaches, we propose an efficient algorithm by extending the "Explore-Subspace-Then-Refine" algorithm of~\citet{jun2019bilinear}. Our efficient algorithm achieves $\widetilde{O}((d_1+d_2)^{3/2}\sqrt{rT})$ regret under a mild condition on the action set $\mathcal{X}$ and the $r$-th singular value of $Θ^*$. Our upper bounds match the conjectured lower bound of \cite{jun2019bilinear} for a subclass of low-rank linear bandit problems. Further, we show that existing lower bounds for the sparse linear bandit problem strongly suggest that our regret bounds are unimprovable. To complement our theoretical contributions, we also conduct experiments to demonstrate that our algorithm can greatly outperform the performance of the standard linear bandit approach when $Θ^*$ is low-rank.