使用图像梯度桥接SIM2REAL缝隙，以完成端到端自动驾驶的任务

论文标题

使用图像梯度桥接SIM2REAL缝隙，以完成端到端自动驾驶的任务

Bridging Sim2Real Gap Using Image Gradients for the Task of End-to-End Autonomous Driving

论文作者

Nair, Unnikrishnan R, Sharma, Sarthak, Parihar, Udit Singh, Menon, Midhun S, Vidapanakal, Srikanth

论文摘要

我们提出了2021年Neurips的一等奖解决方案-AWS Deepracer Challenge。在这场比赛中，任务是培训强化学习代理（即自动驾驶汽车），该学习者通过与环境（模拟轨道）进行互动，通过在给定状态下采取行动来最大程度地提高预期奖励，从而学会了通过与环境进行互动。然后，该模型在带有微型AWS Deepracer汽车的现实世界轨道上进行了测试。我们的目标是训练一个可以尽可能快地完成膝盖而无需驶下赛道的模型。 Deepracer挑战是在自动驾驶汽车领域的一系列具体情报比赛的一部分，称为AI驾驶奥运会（AI-DO）。 AI-DO的总体目的是提供可访问的机制，以基准在自主驾驶任务上应用自主权的进展。这个挑战的棘手部分是Sim2real的学习技能转移。为了减少观测空间中的域间隙，除了从不必要的背景信息中裁剪出来外，我们还进行了巧妙的边缘检测。我们将问题建模为行为克隆任务，并使用MLP-Mixer来优化运行时。我们确保我们的模型能够通过仔细过滤训练数据来处理控制噪声，这为我们提供了一个强大的模型，即使随机更改了50％的命令，也能够完成轨道。该模型的总运行时间仅为现代CPU 2-3ms。

We present the first prize solution to NeurIPS 2021 - AWS Deepracer Challenge. In this competition, the task was to train a reinforcement learning agent (i.e. an autonomous car), that learns to drive by interacting with its environment, a simulated track, by taking an action in a given state to maximize the expected reward. This model was then tested on a real-world track with a miniature AWS Deepracer car. Our goal is to train a model that can complete a lap as fast as possible without going off the track. The Deepracer challenge is a part of a series of embodied intelligence competitions in the field of autonomous vehicles, called The AI Driving Olympics (AI-DO). The overall objective of the AI-DO is to provide accessible mechanisms for benchmarking progress in autonomy applied to the task of autonomous driving. The tricky section of this challenge was the sim2real transfer of the learned skills. To reduce the domain gap in the observation space we did a canny edge detection in addition to cropping out of the unnecessary background information. We modeled the problem as a behavioral cloning task and used MLP-MIXER to optimize for runtime. We made sure our model was capable of handling control noise by careful filtration of the training data and that gave us a robust model capable of completing the track even when 50% of the commands were randomly changed. The overall runtime of the model was only 2-3ms on a modern CPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题