用于实时对象检测的模态屁股

论文标题

用于实时对象检测的模态屁股

Modality-Buffet for Real-Time Object Detection

论文作者

Dorka, Nicolai, Meyer, Johannes, Burgard, Wolfram

论文摘要

使用轻型硬件在视频中进行实时对象检测是许多机器人任务的关键组成部分。使用不同方式和不同计算复杂性的探测器提供了不同的权衡。一种选择是拥有一个非常轻巧的模型，可以在每个帧中一次从所有方式中预测。但是，在某些情况下（例如，在静态场景中），拥有一个更复杂但更准确的模型，并从以前对处理时帧的预测中推断出更为复杂但更准确的模型。我们将此任务制定为一个顺序决策问题，并使用加强学习（RL）来生成一个从RGB输入中决定的策略，该策略是从不同对象检测器的投资组合中检测出来的下一个预测。 RL代理的目的是最大化每个图像预测的准确性。我们在Waymo打开数据集中评估了该方法，并表明它超出了每个检测器的性能。

Real-time object detection in videos using lightweight hardware is a crucial component of many robotic tasks. Detectors using different modalities and with varying computational complexities offer different trade-offs. One option is to have a very lightweight model that can predict from all modalities at once for each frame. However, in some situations (e.g., in static scenes) it might be better to have a more complex but more accurate model and to extrapolate from previous predictions for the frames coming in at processing time. We formulate this task as a sequential decision making problem and use reinforcement learning (RL) to generate a policy that decides from the RGB input which detector out of a portfolio of different object detectors to take for the next prediction. The objective of the RL agent is to maximize the accuracy of the predictions per image. We evaluate the approach on the Waymo Open Dataset and show that it exceeds the performance of each single detector.

下载PDF全文

下载文献需遵守相关版权规定

论文标题