查询两次：双重混合物注意视频摘要的元学习

论文标题

查询两次：双重混合物注意视频摘要的元学习

Query Twice: Dual Mixture Attention Meta Learning for Video Summarization

论文作者

Wang, Junyan, Bai, Yang, Long, Yang, Hu, Bingzhang, Chai, Zhenhua, Guan, Yu, Wei, Xiaolin

论文摘要

视频摘要旨在选择代表性框架来保留高级信息，通常通过通过软马克斯函数来预测细分的重要性得分来解决。但是，SoftMax功能在保留高级表示以获取复杂的视觉或顺序信息时遭受痛苦，这被称为SoftMax瓶颈问题。在本文中，我们提出了一个新颖的框架，以元学习的方式进行视频摘要，以解决SoftMax瓶颈问题，其中注意力层（MOA）的混合能够通过使用两倍的自我Query注意来捕捉到更多的自我介绍，从而可以捕捉到更多的最初Query-Keype Meta，从而有效地提高了模型的能力，从而有效地增加了模型的能力，从而提高了模型的能力。来源。此外，DMASUM显着利用了视觉和顺序关注，以累积的方式连接局部钥匙框架和全球关注。我们在两个公共数据集（Summe和TVsum）上采用新的评估协议。定性和定量实验在最新方法上都显着改善。

Video summarization aims to select representative frames to retain high-level information, which is usually solved by predicting the segment-wise importance score via a softmax function. However, softmax function suffers in retaining high-rank representations for complex visual or sequential information, which is known as the Softmax Bottleneck problem. In this paper, we propose a novel framework named Dual Mixture Attention (DMASum) model with Meta Learning for video summarization that tackles the softmax bottleneck problem, where the Mixture of Attention layer (MoA) effectively increases the model capacity by employing twice self-query attention that can capture the second-order changes in addition to the initial query-key attention, and a novel Single Frame Meta Learning rule is then introduced to achieve more generalization to small datasets with limited training sources. Furthermore, the DMASum significantly exploits both visual and sequential attention that connects local key-frame and global attention in an accumulative way. We adopt the new evaluation protocol on two public datasets, SumMe, and TVSum. Both qualitative and quantitative experiments manifest significant improvements over the state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题