通过连续SE（3）轨迹进行主动目标跟踪的政策学习

论文标题

通过连续SE（3）轨迹进行主动目标跟踪的政策学习

Policy Learning for Active Target Tracking over Continuous SE(3) Trajectories

论文作者

Yang, Pengzhi, Koga, Shumon, Asgharivaskasi, Arash, Atanasov, Nikolay

论文摘要

本文提出了一种基于模型的新型政策梯度算法，用于使用移动机器人跟踪动态目标，该机器人配备了有限的视野。任务是为移动机器人获得连续的控制策略，以收集通过目标分布熵衡量的目标状态不确定性的传感器测量。我们使用机器人$ SE（3）$姿势以及联合目标分布的平均向量和信息矩阵设计神经网络控制策略，作为输入和注意层，以处理可变的目标数量。我们还明确地得出了有关网络参数的目标熵的梯度，从而允许有效的基于模型的策略梯度优化。

This paper proposes a novel model-based policy gradient algorithm for tracking dynamic targets using a mobile robot, equipped with an onboard sensor with limited field of view. The task is to obtain a continuous control policy for the mobile robot to collect sensor measurements that reduce uncertainty in the target states, measured by the target distribution entropy. We design a neural network control policy with the robot $SE(3)$ pose and the mean vector and information matrix of the joint target distribution as inputs and attention layers to handle variable numbers of targets. We also derive the gradient of the target entropy with respect to the network parameters explicitly, allowing efficient model-based policy gradient optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题