使用强化学习图表基于神经网络的调度程序，用于生产计划问题

论文标题

使用强化学习图表基于神经网络的调度程序，用于生产计划问题

Graph neural networks-based Scheduler for Production planning problems using Reinforcement Learning

论文作者

Hameed, Mohammed Sharafath Abdul, Schwung, Andreas

论文摘要

强化学习（RL）越来越多地在车间调度问题（JSSP）中采用。但是JSSP的RL通常使用机器功能作为状态空间的矢量化表示。它有三个主要问题：（1）机器单位与作业顺序之间的关系未完全捕获，（2）随着计算机/作业的增加，状态空间的大小呈指数增加，以及（3）代理商对看不见的场景的概括。我们提出了一个新颖的框架 - 使用强化学习，用于生产计划问题的GRASP-RL，图形神经网络调度程序。它代表JSSP作为图形，并使用使用图神经网络（GNN）提取的功能来训练RL代理。虽然图本身在非欧几里得空间中，但使用GNN提取的功能提供了丰富的欧几里得空间中当前生产状态的编码，然后RL代理将其用于选择下一个作业。此外，我们将调度问题作为分散的优化问题提出，其中学习代理被分配给所有生产单元，并且代理商从所有生产单元收集的数据中学习异步。然后将GRASP-RL应用于具有30个工作岗位和4台机器的复杂注射成型生产环境。任务是最大程度地减少生产计划的制造商。然后，将GRASP-RL计划的时间表进行比较，并使用优先调度规则算法（例如首先出局（FIFO））和诸如tabu搜索（TS）和遗传算法（GA）等优先调度规则算法。拟议的GRASP-RL胜过FIFO，TS和GA，用于在JSSP计划30个工作的训练有素的任务。我们进一步测试了受过两个不同问题类别的训练剂的概括能力：开放式商店系统（OSS）和反应性JSSP（RJSSP），其中我们的方法产生的结果比FIFO更好，并且与TS和GA的结果可比。

Reinforcement learning (RL) is increasingly adopted in job shop scheduling problems (JSSP). But RL for JSSP is usually done using a vectorized representation of machine features as the state space. It has three major problems: (1) the relationship between the machine units and the job sequence is not fully captured, (2) exponential increase in the size of the state space with increasing machines/jobs, and (3) the generalization of the agent to unseen scenarios. We present a novel framework - GraSP-RL, GRAph neural network-based Scheduler for Production planning problems using Reinforcement Learning. It represents JSSP as a graph and trains the RL agent using features extracted using a graph neural network (GNN). While the graph is itself in the non-euclidean space, the features extracted using the GNNs provide a rich encoding of the current production state in the euclidean space, which is then used by the RL agent to select the next job. Further, we cast the scheduling problem as a decentralized optimization problem in which the learning agent is assigned to all the production units and the agent learns asynchronously from the data collected on all the production units. The GraSP-RL is then applied to a complex injection molding production environment with 30 jobs and 4 machines. The task is to minimize the makespan of the production plan. The schedule planned by GraSP-RL is then compared and analyzed with a priority dispatch rule algorithm like first-in-first-out (FIFO) and metaheuristics like tabu search (TS) and genetic algorithm (GA). The proposed GraSP-RL outperforms the FIFO, TS, and GA for the trained task of planning 30 jobs in JSSP. We further test the generalization capability of the trained agent on two different problem classes: Open shop system (OSS) and Reactive JSSP (RJSSP) where our method produces results better than FIFO and comparable results to TS and GA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题