论文标题
如何训练四摩托器:通过增强学习始终如一,响应迅速的飞行控制的框架
How to Train your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning
论文作者
论文摘要
我们专注于可靠的训练加强学习(RL)模型(代理),以在嵌入式系统中进行稳定的低水平控制,并在高性能,定制的四极管平台上测试我们的方法。在开发RL代理以进行连续控制时,常见但经常研究的问题是,制定的控制策略并不总是平稳。当学习控制器%用于在实际硬件上部署时,这种缺乏平滑度可能是一个主要问题,因为它可能导致控制不稳定性和硬件故障。当训练RL代理在模拟中训练RL代理时,由于模拟器最终是现实的不完美表示形式 - 所谓的现实鸿沟,因此进一步加剧了嘈杂的控制问题。为了打击RL代理中不稳定的问题,我们提出了一个系统的框架,即“通过学习的基于增强的可转移代理”(RE+AL)设计模拟培训环境,该环境可以在转移到真实平台时保留训练有素的代理的质量。 RE+Al是我们研究小组成员编写的技术报告中详细介绍的神经光基础设施的演变。 Neuroflight是用于训练RL代理以控制低水平态度控制的最新框架。 RE+AL通过解决许多重要的局限性来改善并完成神经照明,从而阻碍神经照明部署到真实硬件。我们基于NF1赛车四项基准为Neuroflight的一部分开发。我们证明,RE+Al显着减轻了RL药物中先前观察到的平滑度问题。此外,RE+Al被证明可以始终如一地训练具有飞行能力且转移时控制器质量最小的降解。 Re+Al Agents还学会了比调谐的PID控制器更好地表现,并具有更好的跟踪错误,更顺畅的控制和减少功耗。
We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers %intended for deployment on real hardware as it can result in control instability and hardware failure. Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality - what is known as the reality gap. To combat issues of instability in RL agents, we propose a systematic framework, `REinforcement-based transferable Agents through Learning' (RE+AL), for designing simulated training environments which preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight-capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control and reduced power consumption.