驯服拉格朗日混乱与多目标增强学习

论文标题

驯服拉格朗日混乱与多目标增强学习

Taming Lagrangian Chaos with Multi-Objective Reinforcement Learning

论文作者

Calascibetta, Chiara, Biferale, Luca, Borra, Francesco, Celani, Antonio, Cencini, Massimo

论文摘要

我们考虑了2D复合物中两个活性粒子的问题，其多目标目标是最大程度地减少配对的色散率和能耗。我们通过多物镜增强学习（MORL）解决问题，将标量技术与Q学习算法相结合，用于具有可变游泳速度的拉格朗日漂流者。我们表明，莫尔能够找到一套构成最佳帕累托前沿的权衡解决方案。作为基准，我们表明一系列启发式策略由Morl解决方案主导。我们考虑代理无法连续更新其控制变量的情况，但只有在离散（决定）时间（$τ$）之后。我们表明，在Lyapunov时间与持续更新限制之间存在一系列决策时间，强化学习发现策略可以显着改善启发式方法。特别是，我们讨论了大量决策时间需要增强流动的知识，而对于较小的$τ$，所有先验的启发式策略都成为帕累托最佳。

We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the energy consumption of the pair. We approach the problem by means of Multi Objective Reinforcement Learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, $τ$. We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where Reinforcement Learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller $τ$ all a priori heuristic strategies become Pareto optimal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题