论文标题
在在线重复拍卖中应用对手建模进行自动竞标
Applying Opponent Modeling for Automatic Bidding in Online Repeated Auctions
论文作者
论文摘要
在线拍卖场景(例如在广告平台上进行竞标搜索)通常要求竞标者反复参与拍卖相同或相似物品的拍卖。大多数先前的研究仅考虑了卖方在重复拍卖中学习先前依赖性最佳机制的过程。但是,在本文中,我们定义了一种多基础的增强学习环境,在该环境中,战略竞标者和卖方同时学习了他们的策略,并设计了一种自动竞标算法,该算法通过在线互动来更新竞标者的策略。我们提出投标网,以取代线性阴影函数作为战略竞标策略的表示,从而有效地改善了投标人学到的策略的实用性。我们应用并修改对手建模方法来设计PG(伪级)算法,该算法使投标人能够通过预测其他代理的策略过渡来学习最佳的竞标策略。我们证明,当投标人使用PG算法时,它可以学习对静态对手的最佳反应。当所有投标人都采用PG算法时,系统将融合到拍卖会引起的游戏的均衡。在具有不同环境环境和不同对手策略的实验中,PG算法最大化了投标人的效用。我们希望本文能够激发有关战略竞标者自动招标策略的研究。
Online auction scenarios, such as bidding searches on advertising platforms, often require bidders to participate repeatedly in auctions for identical or similar items. Most previous studies have only considered the process by which the seller learns the prior-dependent optimal mechanism in a repeated auction. However, in this paper, we define a multiagent reinforcement learning environment in which strategic bidders and the seller learn their strategies simultaneously and design an automatic bidding algorithm that updates the strategy of bidders through online interactions. We propose Bid Net to replace the linear shading function as a representation of the strategic bidders' strategy, which effectively improves the utility of strategy learned by bidders. We apply and revise the opponent modeling methods to design the PG (pseudo-gradient) algorithm, which allows bidders to learn optimal bidding strategies with predictions of the other agents' strategy transition. We prove that when a bidder uses the PG algorithm, it can learn the best response to static opponents. When all bidders adopt the PG algorithm, the system will converge to the equilibrium of the game induced by the auction. In experiments with diverse environmental settings and varying opponent strategies, the PG algorithm maximizes the utility of bidders. We hope that this article will inspire research on automatic bidding strategies for strategic bidders.