论文标题

COVID-19使用加强学习的大流行循环锁定优化

COVID-19 Pandemic Cyclic Lockdown Optimization Using Reinforcement Learning

论文作者

Arango, Mauricio, Pelov, Lyudmil

论文摘要

这项工作研究了使用加固学习(RL)来优化环状锁定的工作,这是可用于控制Covid-19大流行的方法之一。该问题的结构是用于跟踪参考值的最佳控制系统,对应于关键资源的最大使用水平,例如ICU床。但是,RL不使用常规的最佳控制方法,而是用于查找最佳控制策略。开发了一个框架来使用基于RL的On-Off Controller计算最佳的循环锁定时间。基于RL的控制器被实现为与流行模拟器相互作用的RL代理,该模拟器作为扩展SEIR流行模型实现。 RL代理学习了一个策略功能,该策略功能可产生开放/锁定决策的最佳顺序,以便优化RL奖励功能中指定的目标。使用了两个并发目标:第一个目标是公共卫生目标,它可以最大程度地减少ICU床用法过时的ICU床阈值,而第二个是一个社会经济目标,可以最大程度地减少锁定时间的时间。假定,当一个地区面临迫在眉睫的资源能力限制的危险时,并且在扩展封锁时,循环锁定被认为是扩展锁定的临时替代方法,这将导致严重的社会和经济后果,因为缺乏必要的经济资源在扩展锁定期间支持其受影响的人群。

This work examines the use of reinforcement learning (RL) to optimize cyclic lockdowns, which is one of the methods available for control of the COVID-19 pandemic. The problem is structured as an optimal control system for tracking a reference value, corresponding to the maximum usage level of a critical resource, such as ICU beds. However, instead of using conventional optimal control methods, RL is used to find optimal control policies. A framework was developed to calculate optimal cyclic lockdown timings using an RL-based on-off controller. The RL-based controller is implemented as an RL agent that interacts with an epidemic simulator, implemented as an extended SEIR epidemic model. The RL agent learns a policy function that produces an optimal sequence of open/lockdown decisions such that goals specified in the RL reward function are optimized. Two concurrent goals were used: the first one is a public health goal that minimizes overshoots of ICU bed usage above an ICU bed threshold, and the second one is a socio-economic goal that minimizes the time spent under lockdowns. It is assumed that cyclic lockdowns are considered as a temporary alternative to extended lockdowns when a region faces imminent danger of overpassing resource capacity limits and when imposing an extended lockdown would cause severe social and economic consequences due to lack of necessary economic resources to support its affected population during an extended lockdown.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源