学习使用对抗培训的在线分配问题的强大算法

论文标题

学习使用对抗培训的在线分配问题的强大算法

Learning Robust Algorithms for Online Allocation Problems Using Adversarial Training

论文作者

Zuzic, Goran, Wang, Di, Mehta, Aranyak, Sivakumar, D.

论文摘要

我们解决了使用机器学习方法找到在线分配算法（即双方匹配）的挑战。在本文中，我们专注于AdWords问题，这是一个经典的在线预算匹配问题，具有理论和实际意义。与现有工作相反，我们的目标是完成算法设计{\ em tabula rasa}，即，没有任何人为预见的见解或专家调整的培训数据，而不是指定优化问题的目标和约束。我们根据游戏理论，对抗性训练和甘斯的关键来构建一个框架，这是我们生成对对抗性示例的示例，以暴露任何给定算法的弱点。在我们上下文中，一个独特的挑战是从头开始生成完整的示例，而不是在给定的示例中扰动，我们证明可以解决AdWords问题的实现。我们使用此框架来共同培训算法网络和对抗网络相互对立，直到它们融合到平衡为止。这种方法找到了与已知最佳结果一致的算法和对抗性示例。其次，我们解决了算法的鲁棒性问题，即我们可以设计在实际分布下既有强大的算法，又可以针对对抗性实例展现出强大的性能。为此，我们使用诸如幂律的对抗和实用分布的混合物训练算法网络；最终的网络在两个输入方案之间表现出平稳的权衡。

We address the challenge of finding algorithms for online allocation (i.e. bipartite matching) using a machine learning approach. In this paper, we focus on the AdWords problem, which is a classical online budgeted matching problem of both theoretical and practical significance. In contrast to existing work, our goal is to accomplish algorithm design {\em tabula rasa}, i.e., without any human-provided insights or expert-tuned training data beyond specifying the objective and constraints of the optimization problem. We construct a framework based on insights and ideas from game theory, adversarial training and GANs Key to our approach is to generate adversarial examples that expose the weakness of any given algorithm. A unique challenge in our context is to generate complete examples from scratch rather than perturbing given examples and we demonstrate this can be accomplished for the Adwords problem. We use this framework to co-train an algorithm network and an adversarial network against each other until they converge to an equilibrium. This approach finds algorithms and adversarial examples that are consistent with known optimal results. Secondly, we address the question of robustness of the algorithm, namely can we design algorithms that are both strong under practical distributions, as well as exhibit robust performance against adversarial instances. To accomplish this, we train algorithm networks using a mixture of adversarial and practical distributions like power-laws; the resulting networks exhibit a smooth trade-off between the two input regimes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题