论文标题
视频的零拍活动识别
Zero-Shot Activity Recognition with Videos
论文作者
论文摘要
在本文中,我们使用视频的使用来检查零拍活动识别任务。我们引入了一个基于自动编码器的模型,以在视觉和文本歧管之间构建多模式的嵌入空间。在视觉方面,我们使用了活动视频和最先进的3D卷积动作识别网络来提取功能。在文本方面,我们使用手套词嵌入。零射击识别结果将通过TOP-N精度进行评估。然后,通过平均最近的邻居重叠来测量多种学习能力。最后,我们对结果和未来的方向进行了广泛的讨论。
In this paper, we examined the zero-shot activity recognition task with the usage of videos. We introduce an auto-encoder based model to construct a multimodal joint embedding space between the visual and textual manifolds. On the visual side, we used activity videos and a state-of-the-art 3D convolutional action recognition network to extract the features. On the textual side, we worked with GloVe word embeddings. The zero-shot recognition results are evaluated by top-n accuracy. Then, the manifold learning ability is measured by mean Nearest Neighbor Overlap. In the end, we provide an extensive discussion over the results and the future directions.