冷凝电影：基于故事的检索和上下文嵌入

论文标题

冷凝电影：基于故事的检索和上下文嵌入

Condensed Movies: Story Based Retrieval with Contextual Embeddings

论文作者

Bain, Max, Nagrani, Arsha, Brown, Andrew, Zisserman, Andrew

论文摘要

我们在这项工作中的目标是对电影叙事结构的长期理解。我们建议没有考虑整部电影，而是建议从电影的“关键场景”中学习，从而对完整的故事情节进行了简洁的观察。为此，我们做出以下三个贡献：（i）我们创建了由3K电影中的关键场景组成的凝结电影数据集（CMD）：每个关键场景都伴随着场景的高级语义描述，角色面对面的轨迹，以及有关电影的元数据。该数据集可扩展，从YouTube自动获得，并且可以免费使用任何人下载和使用。这也是电影数量中现有的电影数据集的数量级。（ii）我们为数据集中的文本到视频检索提供了深层网络基线，将角色，语音和视觉提示结合到单个视频嵌入中；最后（iii）我们演示了其他视频剪辑中上下文的添加如何改善检索性能。

Our objective in this work is long range understanding of the narrative structure of movies. Instead of considering the entire movie, we propose to learn from the `key scenes' of the movie, providing a condensed look at the full storyline. To this end, we make the following three contributions: (i) We create the Condensed Movies Dataset (CMD) consisting of the key scenes from over 3K movies: each key scene is accompanied by a high level semantic description of the scene, character face-tracks, and metadata about the movie. The dataset is scalable, obtained automatically from YouTube, and is freely available for anybody to download and use. It is also an order of magnitude larger than existing movie datasets in the number of movies; (ii) We provide a deep network baseline for text-to-video retrieval on our dataset, combining character, speech and visual cues into a single video embedding; and finally (iii) We demonstrate how the addition of context from other video clips improves retrieval performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题