MPC的实用强化学习：在一个小时内，在一个不到一个小时内从稀疏目标中学习

论文标题

MPC的实用强化学习：在一个小时内，在一个不到一个小时内从稀疏目标中学习

Practical Reinforcement Learning For MPC: Learning from sparse objectives in under an hour on a real robot

论文作者

Karnchanachari, Napat, Valls, Miguel I., Hoeller, David, Hutter, Marco

论文摘要

模型预测控制（MPC）是一种强大的控制技术，可处理约束，考虑系统的动态并为给定的成本函数进行优化。但是，实际上，它通常需要专家来制作和调整此成本功能，并在不同的州罚款之间找到权衡，以满足简单的高级目标。在本文中，我们使用强化学习和尤其是价值学习来近似仅在高级别目标的情况下近似值功能，这可能是稀疏和二进制的。在以前的作品的基础上，我们提出了改进，使我们能够成功地将这种方法部署在现实世界中的无人接地车上。我们的实验表明，我们的方法可以从头开始学习成本功能，而无需人力干预，同时达到类似于专家调整的MPC的性能水平。我们在模拟和实际机器人中对这些方法进行了定量比较。

Model Predictive Control (MPC) is a powerful control technique that handles constraints, takes the system's dynamics into account, and optimizes for a given cost function. In practice, however, it often requires an expert to craft and tune this cost function and find trade-offs between different state penalties to satisfy simple high level objectives. In this paper, we use Reinforcement Learning and in particular value learning to approximate the value function given only high level objectives, which can be sparse and binary. Building upon previous works, we present improvements that allowed us to successfully deploy the method on a real world unmanned ground vehicle. Our experiments show that our method can learn the cost function from scratch and without human intervention, while reaching a performance level similar to that of an expert-tuned MPC. We perform a quantitative comparison of these methods with standard MPC approaches both in simulation and on the real robot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题