论文标题
FrameHopper:在检测驱动的实时视频分析中选择性处理视频帧
FrameHopper: Selective Processing of Video Frames in Detection-driven Real-Time Video Analytics
论文作者
论文摘要
检测驱动的实时视频分析需要使用Yolov3(例如Yolov3,foriditedDet)等深度学习模型对视频帧中包含的对象进行连续检测。但是,在资源受限的边缘设备中的每个帧上运行这些检测器都是计算密集的。通过考虑连续的视频帧之间的时间相关性,我们注意到检测输出往往在连续的帧中重叠。消除类似的连续帧将导致性能下降,同时通过降低整体计算和通信成本,从而提供显着的性能收益。因此,(a)如何确定对象检测器要处理哪些帧,以及(b)一旦选择了要处理的帧后,可以跳过多少个连续的帧(称为跳过)。该过程的总体目标是由于跳过帧尽可能小而保持错误。我们引入了一个新颖的错误与处理速率优化问题,相对于对象检测任务,该任务在错误率和帧过滤的分数之间存在平衡。随后,我们提出了基于离线增强学习(RL)的算法,以确定这些跳过的长度作为从录制视频中的RL代理的国家行动策略,然后在线部署代理在线进行实时视频流。为此,我们开发了Edge-Cloud协作视频分析框架FrameHopper,该框架在相机上运行了轻巧的训练有素的RL代理,并将过滤后的帧传递到对象检测模型为一组应用程序运行的服务器。我们已经在现实生活中捕获的许多实时视频上测试了我们的方法,并表明framehopper处理只有少数帧,但在大多数情况下,在靠近Oracle解决方案方面更接近Oracle解决方案,并且在最新的最先进的解决方案方面效果更接近。
Detection-driven real-time video analytics require continuous detection of objects contained in the video frames using deep learning models like YOLOV3, EfficientDet. However, running these detectors on each and every frame in resource-constrained edge devices is computationally intensive. By taking the temporal correlation between consecutive video frames into account, we note that detection outputs tend to be overlapping in successive frames. Elimination of similar consecutive frames will lead to a negligible drop in performance while offering significant performance benefits by reducing overall computation and communication costs. The key technical questions are, therefore, (a) how to identify which frames to be processed by the object detector, and (b) how many successive frames can be skipped (called skip-length) once a frame is selected to be processed. The overall goal of the process is to keep the error due to skipping frames as small as possible. We introduce a novel error vs processing rate optimization problem with respect to the object detection task that balances between the error rate and the fraction of frames filtering. Subsequently, we propose an off-line Reinforcement Learning (RL)-based algorithm to determine these skip-lengths as a state-action policy of the RL agent from a recorded video and then deploy the agent online for live video streams. To this end, we develop FrameHopper, an edge-cloud collaborative video analytics framework, that runs a lightweight trained RL agent on the camera and passes filtered frames to the server where the object detection model runs for a set of applications. We have tested our approach on a number of live videos captured from real-life scenarios and show that FrameHopper processes only a handful of frames but produces detection results closer to the oracle solution and outperforms recent state-of-the-art solutions in most cases.