对自动驾驶决策的反事实政策评估

论文标题

对自动驾驶决策的反事实政策评估

Counterfactual Policy Evaluation for Decision-Making in Autonomous Driving

论文作者

Hart, Patrick, Knoll, Alois

论文摘要

基于学习的方法，例如增强和模仿学习，在自主驾驶的决策中越来越受欢迎。但是，学到的政策通常无法概括，无法很好地处理新的情况。以“如果其他代理商的行为不同，政策会表现良好？”提出和回答问题。是否可以阐明一项政策在培训期间是否遇到了类似情况并概述。在这项工作中，引入了反事实评估，利用反事实世界 - 他人的行为是非事实的世界。如果一项政策可以很好地处理所有反事实世界，那么它要么在培训期间看到了类似的情况，要么可以很好地概括，并且被认为足够适合在现实世界中执行。此外，通过执行反事实政策评估，因果关系以及改变车辆行为对周围车辆的影响变得明显。为了验证所提出的方法，我们将使用强化学习来学习一项政策，以进行车道合并方案。在应用程序期间，仅在执行反事实策略评估以及发现该策略足够安全之后才执行策略。我们表明，所提出的方法显着降低了碰撞率，同时保持高成功率。

Learning-based approaches, such as reinforcement and imitation learning are gaining popularity in decision-making for autonomous driving. However, learned policies often fail to generalize and cannot handle novel situations well. Asking and answering questions in the form of "Would a policy perform well if the other agents had behaved differently?" can shed light on whether a policy has seen similar situations during training and generalizes well. In this work, a counterfactual policy evaluation is introduced that makes use of counterfactual worlds - worlds in which the behaviors of others are non-actual. If a policy can handle all counterfactual worlds well, it either has seen similar situations during training or it generalizes well and is deemed to be fit enough to be executed in the actual world. Additionally, by performing the counterfactual policy evaluation, causal relations and the influence of changing vehicle's behaviors on the surrounding vehicles becomes evident. To validate the proposed method, we learn a policy using reinforcement learning for a lane merging scenario. In the application-phase, the policy is only executed after the counterfactual policy evaluation has been performed and if the policy is found to be safe enough. We show that the proposed approach significantly decreases the collision-rate whilst maintaining a high success-rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题