重复的第一价格拍卖中的最佳无重组学习

论文标题

重复的第一价格拍卖中的最佳无重组学习

Optimal No-regret Learning in Repeated First-price Auctions

论文作者

Han, Yanjun, Zhou, Zhengyuan, Weissman, Tsachy

论文摘要

我们在重复的一位价格拍卖中研究在线学习，投标人只观察每次拍卖结束时获胜的出价，学会了适应性的出价，以最大程度地提高她的累积收益。为了实现这一目标，投标人面临审查的反馈：如果她赢得了出价，那么她将无法观察其他投标人的最高出价，我们认为这是\ textit {iid}从未知分布中汲取的。在本文中，我们开发了第一种学习算法，该算法获得了近乎最佳的$ \ wideTilde {o}（\ sqrt {t}）$遗憾，通过利用第一优先拍卖的两个结构属性，即特定的反馈结构和回报函数。我们首先将第一价格拍卖中的反馈结构作为部分订购的上下文匪徒，跨动作（BID）的图形反馈的组合，跨环境（私人价值）（私人价值）的交叉学习以及在上下文中的部分顺序。我们通过表现出一个奇怪的分离来确定该框架的优势和劣势，即在随机环境中几乎可以独立于行动/上下文大小的遗憾，但在对抗性环境下是不可能的。特别地，此框架导致$ O（\ sqrt {t} \ log^{2.5} t）$遗憾的是，当投标人的私人价值观为\ emph {iid}时，首先拍卖。尽管上述框架受到限制，但我们进一步利用了第一价格拍卖的特殊回报函数，即使在存在对抗性生成的私有值的情况下，即使存在样本有效算法也是如此。我们建立了一个$ o（\ sqrt {t} \ log^3 t）$遗憾的遗憾，因此为此算法提供了完整的表征，可以为第一价拍卖提供最佳的学习保证。

We study online learning in repeated first-price auctions where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces censored feedback: if she wins the bid, then she is not able to observe the highest bid of the other bidders, which we assume is \textit{iid} drawn from an unknown distribution. In this paper, we develop the first learning algorithm that achieves a near-optimal $\widetilde{O}(\sqrt{T})$ regret bound, by exploiting two structural properties of first-price auctions, i.e. the specific feedback structure and payoff function. We first formulate the feedback structure in first-price auctions as partially ordered contextual bandits, a combination of the graph feedback across actions (bids), the cross learning across contexts (private values), and a partial order over the contexts. We establish both strengths and weaknesses of this framework, by showing a curious separation that a regret nearly independent of the action/context sizes is possible under stochastic contexts, but is impossible under adversarial contexts. In particular, this framework leads to an $O(\sqrt{T}\log^{2.5}T)$ regret for first-price auctions when the bidder's private values are \emph{iid}. Despite the limitation of the above framework, we further exploit the special payoff function of first-price auctions to develop a sample-efficient algorithm even in the presence of adversarially generated private values. We establish an $O(\sqrt{T}\log^3 T)$ regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for first-price auctions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题