论文标题
使用合成数据验证基于UAV的图像中对象检测的验证
Validation of object detection in UAV-based images using synthetic data
论文作者
论文摘要
物体检测越来越多地用于各种应用程序上的无人驾驶汽车(UAV);但是,通常使用针对与UAV应用程序无关的任务进行策划的数据,用于基于无人机检测的机器学习(ML)模型。这是一个问题,因为大规模基准的训练神经网络在通用对象检测任务中表现出了出色的能力,但是常规的训练方法可能会导致基于无人机的图像的大型推理错误。由于来自无人机和训练中图像的图像之间的成像条件之间的差异,因此出现了此类错误。为了克服这个问题,我们表征了ML模型的边界条件,除此之外,模型在检测准确性方面表现出快速降解。我们的工作专注于通过使用使用游戏引擎生成的合成数据来理解不同基于无人机的成像条件对检测性能的影响。利用游戏引擎的属性来填充具有逼真和注释图像的合成数据集。具体而言,它可以很好地控制各种参数,例如相机位置,视角,照明条件和对象姿势。使用合成数据集,我们分析了不同成像条件下的检测准确性作为上述参数的函数。我们在工作中使用了具有不同模型复杂性的三种知名神经网络模型。在我们的实验中,我们观察并量化以下内容:1)检测准确性如何降低,因为相机向Nadir-View区域移动; 2)检测精度如何根据不同对象的姿势而变化,3)模型的鲁棒性随着照明条件而变化的程度。
Object detection is increasingly used onboard Unmanned Aerial Vehicles (UAV) for various applications; however, the machine learning (ML) models for UAV-based detection are often validated using data curated for tasks unrelated to the UAV application. This is a concern because training neural networks on large-scale benchmarks have shown excellent capability in generic object detection tasks, yet conventional training approaches can lead to large inference errors for UAV-based images. Such errors arise due to differences in imaging conditions between images from UAVs and images in training. To overcome this problem, we characterize boundary conditions of ML models, beyond which the models exhibit rapid degradation in detection accuracy. Our work is focused on understanding the impact of different UAV-based imaging conditions on detection performance by using synthetic data generated using a game engine. Properties of the game engine are exploited to populate the synthetic datasets with realistic and annotated images. Specifically, it enables the fine control of various parameters, such as camera position, view angle, illumination conditions, and object pose. Using the synthetic datasets, we analyze detection accuracy in different imaging conditions as a function of the above parameters. We use three well-known neural network models with different model complexity in our work. In our experiment, we observe and quantify the following: 1) how detection accuracy drops as the camera moves toward the nadir-view region; 2) how detection accuracy varies depending on different object poses, and 3) the degree to which the robustness of the models changes as illumination conditions vary.