视频的零拍活动识别

论文标题

视频的零拍活动识别

Zero-Shot Activity Recognition with Videos

论文作者

Ornek, Evin Pinar

论文摘要

在本文中，我们使用视频的使用来检查零拍活动识别任务。我们引入了一个基于自动编码器的模型，以在视觉和文本歧管之间构建多模式的嵌入空间。在视觉方面，我们使用了活动视频和最先进的3D卷积动作识别网络来提取功能。在文本方面，我们使用手套词嵌入。零射击识别结果将通过TOP-N精度进行评估。然后，通过平均最近的邻居重叠来测量多种学习能力。最后，我们对结果和未来的方向进行了广泛的讨论。

In this paper, we examined the zero-shot activity recognition task with the usage of videos. We introduce an auto-encoder based model to construct a multimodal joint embedding space between the visual and textual manifolds. On the visual side, we used activity videos and a state-of-the-art 3D convolutional action recognition network to extract the features. On the textual side, we worked with GloVe word embeddings. The zero-shot recognition results are evaluated by top-n accuracy. Then, the manifold learning ability is measured by mean Nearest Neighbor Overlap. In the end, we provide an extensive discussion over the results and the future directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题