在动态定价中，N-player Markov游戏的NASH均衡学习近似

论文标题

在动态定价中，N-player Markov游戏的NASH均衡学习近似

Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

论文作者

Liu, Larkin

论文摘要

我们研究了竞争激烈的马尔可夫游戏（MG）环境中的NASH平衡学习，其中多个代理商竞争，并且可能存在多个NASH均衡。特别是，对于寡头的动态定价环境，由于差异性的诅咒，难以获得确切的NASH平衡。我们开发了一种新的无模型方法来找到近似的纳什平衡。然后，将无梯度的黑匣子优化应用于估计$ε$，这是代理商单方面偏离任何联合政策的最大奖励优势，并估算了任何给定州的$ε$降低政策。政策-Yε$通讯和国家对$ε$ - 限制政策的政策由Neural Networks表示，后者是NASH策略网络。在批处理更新期间，我们通过使用NASH策略网调整操作概率在系统上进行NASH Q学习。我们证明可以学习近似的NASH平衡，尤其是在精确溶液通常很棘手的动态定价域中。

We investigate Nash equilibrium learning in a competitive Markov Game (MG) environment, where multiple agents compete, and multiple Nash equilibria can exist. In particular, for an oligopolistic dynamic pricing environment, exact Nash equilibria are difficult to obtain due to the curse-of-dimensionality. We develop a new model-free method to find approximate Nash equilibria. Gradient-free black box optimization is then applied to estimate $ε$, the maximum reward advantage of an agent unilaterally deviating from any joint policy, and to also estimate the $ε$-minimizing policy for any given state. The policy-$ε$ correspondence and the state to $ε$-minimizing policy are represented by neural networks, the latter being the Nash Policy Net. During batch update, we perform Nash Q learning on the system, by adjusting the action probabilities using the Nash Policy Net. We demonstrate that an approximate Nash equilibrium can be learned, particularly in the dynamic pricing domain where exact solutions are often intractable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题