延迟：有效学习具有延迟匪徒反馈的多类分类器

论文标题

延迟：有效学习具有延迟匪徒反馈的多类分类器

Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

论文作者

Manwani, Naresh, Agarwal, Mudit

论文摘要

在本文中，我们介绍了称为{\ it delaytron}的在线算法，用于使用延迟的强盗反馈来学习多类分类器。反馈的顺序延迟$ \ {d_t \} _ {t = 1}^t $是算法未知的。在$ t $ thound的比赛中，该算法观察一个示例$ \ mathbf {x} _t $，并预测标签$ \ tilde {y} _t $，并接收强盗反馈$ \ mathBb {i}当$ t+d_t> t $时，我们认为$ t $ TH-thound的反馈丢失了。我们表明，所提出的算法达到了$ \ Mathcal {o} \ left的遗憾（\ sqrt {\ frac {\ frac {2 k}γ\ left [\ frac {t} {2} {2} {2}+\ lest（2+ \ frac \ frac {l^2} {l^2} {r^2} {r^2} {r^r^r^vert \ w \ vert_f^2} \ right）\ sum_ {t = 1}^td_t \ right]} \ right）$）在没有丢失样本的损失不限制的情况下，延迟的遗憾为$ \ natcal {o} \ left（\ sqrt {\ frac {\ frac {2 k}γ\ left [\ frac {t} t \ right]} \ right）$，其中$ \ mathcal {m} $是$ t $ rounds中的一组丢失的样本。这些边界是通过恒定的步长实现的，需要$ t $和$ \ sum_ {t = 1}^td_t $的知识。对于$ t $和$ \ sum_ {t = 1}^td_t $的情况是未知的，我们在线学习和提出的自适应延迟trestron使用了加倍的技巧。我们表明，自适应延迟the术在$ \ mathcal {o} \ left（\ sqrt {t+\ sum_ {t = 1}^td_t} \ right）$中获得遗憾。我们通过在各种数据集中进行实验并与最新方法进行比较来展示我们的方法的有效性。

In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays $\{d_t\}_{t=1}^T$ is unknown to the algorithm. At the $t$-th round, the algorithm observes an example $\mathbf{x}_t$ and predicts a label $\tilde{y}_t$ and receives the bandit feedback $\mathbb{I}[\tilde{y}_t=y_t]$ only $d_t$ rounds later. When $t+d_t>T$, we consider that the feedback for the $t$-th round is missing. We show that the proposed algorithm achieves regret of $\mathcal{O}\left(\sqrt{\frac{2 K}γ\left[\frac{T}{2}+\left(2+\frac{L^2}{R^2\Vert \W\Vert_F^2}\right)\sum_{t=1}^Td_t\right]}\right)$ when the loss for each missing sample is upper bounded by $L$. In the case when the loss for missing samples is not upper bounded, the regret achieved by Delaytron is $\mathcal{O}\left(\sqrt{\frac{2 K}γ\left[\frac{T}{2}+2\sum_{t=1}^Td_t+\vert \mathcal{M}\vert T\right]}\right)$ where $\mathcal{M}$ is the set of missing samples in $T$ rounds. These bounds were achieved with a constant step size which requires the knowledge of $T$ and $\sum_{t=1}^Td_t$. For the case when $T$ and $\sum_{t=1}^Td_t$ are unknown, we use a doubling trick for online learning and proposed Adaptive Delaytron. We show that Adaptive Delaytron achieves a regret bound of $\mathcal{O}\left(\sqrt{T+\sum_{t=1}^Td_t}\right)$. We show the effectiveness of our approach by experimenting on various datasets and comparing with state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题