Formulazero：通过离线人口合成在分布强劲的在线适应

论文标题

Formulazero：通过离线人口合成在分布强劲的在线适应

FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis

论文作者

Sinha, Aman, O'Kelly, Matthew, Zheng, Hongrui, Mangharam, Rahul, Duchi, John, Tedrake, Russ

论文摘要

平衡性能和安全性对于在多机构环境中部署自动驾驶汽车至关重要。特别是，自主赛车是一个惩罚安全但保守的政策的领域，强调了对强大的自适应策略的需求。当前的方法要么简化有关其他代理的假设，要么缺乏在线适应的强大机制。这项工作为这两个挑战做出了算法的贡献。首先，为了产生一个现实的，多样化的对手，我们开发了一种基于复制 - 交换马尔可夫链蒙特卡洛的新方法来进行自我播放。其次，我们提出了一个强大的强壮的匪徒优化程序，该程序可自适应地调节风险厌恶相对于对手行为的信念的不确定性。在实时运动计划中近似这些计算时，我们严格量化了性能和鲁棒性的权衡，并且在实验上，我们在实验上证明了我们的方法，以实现与一级方程式赛车相当的缩放速度的自动驾驶汽车。

Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents' behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.

下载PDF全文

下载文献需遵守相关版权规定

论文标题