在基于查询的视频时刻检索中发现隐藏的挑战

论文标题

在基于查询的视频时刻检索中发现隐藏的挑战

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

论文作者

Otani, Mayu, Nakashima, Yuta, Rahtu, Esa, Heikkilä, Janne

论文摘要

基于查询的时刻检索是根据查询句子从未修剪视频中定位特定剪辑的问题。这是一项具有挑战性的任务，需要对自然语言查询和视频内容进行解释。就像在计算机视觉和机器学习中的许多其他领域一样，基于查询的力矩检索的进展受到基准数据集的巨大驱动，因此，它们的质量对该领域产生了重大影响。在本文中，我们提出了一系列实验，以评估基准结果如何反映解决时刻检索任务的真正进步。我们的结果表明，最新模型的流行数据集和意外行为中存在很大的偏见。此外，我们提出了新的理智检查实验和方法，以可视化结果。最后，我们建议可能的指示以改善未来的时间句子基础。我们的本文代码可在https://mayu-ot.github.io/hidden-challenges-mr上找到。

The query-based moment retrieval is a problem of localising a specific clip from an untrimmed video according a query sentence. This is a challenging task that requires interpretation of both the natural language query and the video content. Like in many other areas in computer vision and machine learning, the progress in query-based moment retrieval is heavily driven by the benchmark datasets and, therefore, their quality has significant impact on the field. In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task. Our results indicate substantial biases in the popular datasets and unexpected behaviour of the state-of-the-art models. Moreover, we present new sanity check experiments and approaches for visualising the results. Finally, we suggest possible directions to improve the temporal sentence grounding in the future. Our code for this paper is available at https://mayu-ot.github.io/hidden-challenges-MR .

下载PDF全文

下载文献需遵守相关版权规定

论文标题