通过基于离线排名的策略学习，有效的混合企业编程的分支机构排名

论文标题

通过基于离线排名的策略学习，有效的混合企业编程的分支机构排名

Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning

论文作者

Huang, Zeren, Chen, Wenhao, Zhang, Weinan, Shi, Chuhan, Liu, Furui, Zhen, Hui-Ling, Yuan, Mingxuan, Hao, Jianye, Yu, Yong, Wang, Jun

论文摘要

在分支机构和结合中得出良好的可变选择策略对于现代混合编程（MIP）求解器的效率至关重要。通过在上一个解决方案过程中收集的MIP分支数据，学习分支方法最近变得胜过启发式方法。由于分支机构自然是一项顺序决策任务，因此应该学会优化整个MIP求解过程的实用性，而不是在每个步骤上都是近视。在这项工作中，我们将学习作为离线增强学习（RL）问题进行分支，并提出了一种长期视线的混合搜索方案来构建离线MIP数据集，该数据集对分支决策的长期实用程序。在政策培训阶段，我们部署了基于排名的奖励分配计划，以将有希望的样本与长期或短期视图区分开，并通过离线政策学习训练名为Branch排名的分支模型。对合成MIP基准和现实世界任务的实验表明，与广泛使用的启发式方法和基于先进的学习分支模型相比，分支rankink更有效，更健壮，并且可以更好地概括为MIP实例的大型MIP实例。

Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision making task, one should learn to optimize the utility of the whole MIP solving process instead of being myopic on each step. In this work, we formulate learning to branch as an offline reinforcement learning (RL) problem, and propose a long-sighted hybrid search scheme to construct the offline MIP dataset, which values the long-term utilities of branching decisions. During the policy training phase, we deploy a ranking-based reward assignment scheme to distinguish the promising samples from the long-term or short-term view, and train the branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust, and can better generalize to large scales of MIP instances compared to the widely used heuristics and state-of-the-art learning-based branching models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题