通过有限的奖励响应过滤器增强强化学习，并通过智能结构控制中的案例研究

论文标题

通过有限的奖励响应过滤器增强强化学习，并通过智能结构控制中的案例研究

Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control

论文作者

Rahmani, Hamid Radmard, Koenke, Carsten, Wiering, Marco A.

论文摘要

在许多强化学习（RL）问题中，要花费一些时间，直到代理商采取的行动达到其对环境的最大效果，因此代理商通过称为动作效应延迟的延迟来收到与该动作相对应的奖励。这样的延迟降低了学习算法的性能并提高计算成本，因为强化学习者对即时奖励的价值比与采取行动更相关的未来奖励更重要。本文通过引入适用的增强Q学习方法来解决此问题，在学习阶段开始时，代理采取单个操作并构建一个反映对该动作的环境响应的函数，称为反射$γ$ - 函数。在训练阶段，代理使用创建的反射$γ$ - 功能来更新Q值。我们已经将开发的方法应用于结构控制问题，在该问题中，代理的目的是减少以特定延迟进行地震激发的建筑物的振动。由于地震的随机性和不可预测的性质和结构的复杂行为，地震控制问题被认为是结构工程中的一项复杂任务。提出了三种情况，以研究零，中和长作用效应延迟的影响，并将增强方法的性能与标准Q学习方法进行了比较。两种RL方法都使用神经网络来学习估计用于控制结构的状态行动值函数。结果表明，增强方法在所有情况下都显着胜过原始方法的性能，并且还提高了算法在处理动作效应延迟方面的稳定性。

In many reinforcement learning (RL) problems, it takes some time until a taken action by the agent reaches its maximum effect on the environment and consequently the agent receives the reward corresponding to that action by a delay called action-effect delay. Such delays reduce the performance of the learning algorithm and increase the computational costs, as the reinforcement learning agent values the immediate rewards more than the future reward that is more related to the taken action. This paper addresses this issue by introducing an applicable enhanced Q-learning method in which at the beginning of the learning phase, the agent takes a single action and builds a function that reflects the environments response to that action, called the reflexive $γ$ - function. During the training phase, the agent utilizes the created reflexive $γ$- function to update the Q-values. We have applied the developed method to a structural control problem in which the goal of the agent is to reduce the vibrations of a building subjected to earthquake excitations with a specified delay. Seismic control problems are considered as a complex task in structural engineering because of the stochastic and unpredictable nature of earthquakes and the complex behavior of the structure. Three scenarios are presented to study the effects of zero, medium, and long action-effect delays and the performance of the Enhanced method is compared to the standard Q-learning method. Both RL methods use neural network to learn to estimate the state-action value function that is used to control the structure. The results show that the enhanced method significantly outperforms the performance of the original method in all cases, and also improves the stability of the algorithm in dealing with action-effect delays.

下载PDF全文

下载文献需遵守相关版权规定

论文标题