论文标题
查看从多个深度传感器的不变人体检测和姿势估计
View Invariant Human Body Detection and Pose Estimation from Multiple Depth Sensors
论文作者
论文摘要
基于点云的方法在自动驾驶中的3D对象检测等领域产生了有希望的结果。但是,最近的大多数Point Cloud工作都集中在单个深度传感器数据上,而对室内监视应用程序的工作却更少,例如医院中的操作室监控或室内监视。在这些情况下,多个相机通常用于解决遮挡问题。我们使用多个点云来源提出了一个端到端的多人3D姿势估计网络点R-CNN。我们进行了广泛的实验,以模拟具有挑战性的现实情况,例如单个相机故障,各种目标外观以及使用CMU Panoptic数据集和MVOR操作室数据集的复杂场景。与大多数试图通过构建复杂融合模型使用多个传感器信息(通常导致泛化的复杂融合模型)的方法不同,我们利用串联点云的效率在输入级别融合信息。同时,我们显示我们的端到端网络的表现极大地胜过级联的最新模型。
Point cloud based methods have produced promising results in areas such as 3D object detection in autonomous driving. However, most of the recent point cloud work focuses on single depth sensor data, whereas less work has been done on indoor monitoring applications, such as operation room monitoring in hospitals or indoor surveillance. In these scenarios multiple cameras are often used to tackle occlusion problems. We propose an end-to-end multi-person 3D pose estimation network, Point R-CNN, using multiple point cloud sources. We conduct extensive experiments to simulate challenging real world cases, such as individual camera failures, various target appearances, and complex cluttered scenes with the CMU panoptic dataset and the MVOR operation room dataset. Unlike most of the previous methods that attempt to use multiple sensor information by building complex fusion models, which often lead to poor generalization, we take advantage of the efficiency of concatenating point clouds to fuse the information at the input level. In the meantime, we show our end-to-end network greatly outperforms cascaded state-of-the-art models.