论文标题

MILA:通过有效的框架间注意从视频中学习多任务

MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention

论文作者

Kim, Donghyun, Lan, Tian, Zou, Chuhang, Xu, Ning, Plummer, Bryan A., Sclaroff, Stan, Eledath, Jayan, Medioni, Gerard

论文摘要

多任务学习的先前工作主要集中在单个图像上的预测上。在这项工作中,我们提出了一种新的方法,可以通过有效的当地关注(MILA)从视频中学习多任务。我们的方法包含一个新颖的框架间注意模块,该模块允许学习跨帧的特定任务注意力。我们将注意力模块嵌入了``slow-fast''体系结构中,在较慢的采样键帧上,较慢的网络运行,而轻量级的浅网络以高帧速率在非键框架上运行。我们还提出了一种有效的对抗学习策略,以鼓励缓慢而快速的网络学习类似的功能。我们的方法可确保低延迟的多任务学习,同时保持高质量的预测。与在两个多任务学习基准测试的最先进的同时,实验表现出具有竞争力的精度,同时将浮点操作(拖船)的数量最多减少了70 \%。此外,我们基于注意力的特征传播方法(ILA)在任务准确性方面优于先前的工作,同时还降低了90 \%的拖鞋。

Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a ``slow-fast'' architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate. We also propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features. Our approach ensures low-latency multi-task learning while maintaining high quality predictions. Experiments show competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by up to 70\%. In addition, our attention based feature propagation method (ILA) outperforms prior work in terms of task accuracy while also reducing up to 90\% of FLOPs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源