通过随机梯度下降对上下文匪徒的在线统计推断

论文标题

通过随机梯度下降对上下文匪徒的在线统计推断

Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent

论文作者

Chen, Xi, Lai, Zehua, Li, He, Zhang, Yichen

论文摘要

随着大数据的快速发展，通过递归更新决策规则并做出在线决策来学习最佳决策规则比以前要容易得多。我们研究模型参数的在线统计推断在连续决策的上下文匪徒框架中。我们为在线和自适应数据收集环境提出了一个通用框架，该框架可以通过加权随机梯度下降来更新决策规则。我们允许随机梯度的不同加权方案，并建立参数估计量的渐近正态性。我们提出的估计量通过反概率权重来显着提高以前平均SGD方法的渐近效率。我们还对线性回归设置中的权重进行了最佳分析。我们提供了建议的估计器的巴哈杜尔表示，并表明，与自适应数据收集相比，巴哈杜尔表示中的剩余项需要较慢的收敛速度。

With the fast development of big data, it has been easier than before to learn the optimal decision rule by updating the decision rule recursively and making online decisions. We study the online statistical inference of model parameters in a contextual bandit framework of sequential decision-making. We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted stochastic gradient descent. We allow different weighting schemes of the stochastic gradient and establish the asymptotic normality of the parameter estimator. Our proposed estimator significantly improves the asymptotic efficiency over the previous averaged SGD approach via inverse probability weights. We also conduct an optimality analysis on the weights in a linear regression setting. We provide a Bahadur representation of the proposed estimator and show that the remainder term in the Bahadur representation entails a slower convergence rate compared to classical SGD due to the adaptive data collection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题