基于跨度的自然语言视频本地化本地化网络

论文标题

基于跨度的自然语言视频本地化本地化网络

Span-based Localizing Network for Natural Language Video Localization

论文作者

Zhang, Hao, Sun, Aixin, Jing, Wei, Zhou, Joey Tianyi

论文摘要

给定未修剪的视频和文本查询，自然语言视频本地化（NLVL）是从语义上与查询相对应的视频中找到匹配跨度。现有解决方案将NLVL作为排名任务，并应用多模式匹配体系结构，或作为回归任务直接回归目标视频跨度。在这项工作中，我们通过将输入视频视为文本段落来通过基于SPAN的质量检查方法来解决NLVL任务。我们建议在基于标准跨度的QA框架之上的视频跨度本地化网络（VSLNET）来解决NLVL。提出的VSLNET通过简单但有效的查询引导的突出显示（QGH）策略来解决NLVL和基于SPAN的质量检查之间的差异。 QGH指南VSLNET在突出显示的区域中搜索匹配的视频跨度。通过在三个基准数据集上进行的大量实验，我们表明所提出的VSLNET优于最新方法。采用基于跨度的质量检查框架是解决NLVL的有前途的方向。

Given an untrimmed video and a text query, natural language video localization (NLVL) is to locate a matching span from the video that semantically corresponds to the query. Existing solutions formulate NLVL either as a ranking task and apply multimodal matching architecture, or as a regression task to directly regress the target video span. In this work, we address NLVL task with a span-based QA approach by treating the input video as text passage. We propose a video span localizing network (VSLNet), on top of the standard span-based QA framework, to address NLVL. The proposed VSLNet tackles the differences between NLVL and span-based QA through a simple yet effective query-guided highlighting (QGH) strategy. The QGH guides VSLNet to search for matching video span within a highlighted region. Through extensive experiments on three benchmark datasets, we show that the proposed VSLNet outperforms the state-of-the-art methods; and adopting span-based QA framework is a promising direction to solve NLVL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题