论文标题
newskvqa:知识吸引的新闻视频问题回答
NEWSKVQA: Knowledge-Aware News Video Question Answering
论文作者
论文摘要
在视频背景下回答问题可能有助于视频索引,视频检索系统,视频摘要,学习管理系统和监视视频分析。尽管在视觉问题回答上存在大量工作,但视频问题回答(1)的工作仅限于电影,电视节目,游戏玩法或人类活动等领域,并且(2)主要基于常识推理。在本文中,我们探讨了视频问题回答的新边界:在新闻视频的背景下回答基于知识的问题。为此,我们策划了一个新的12K新闻视频数据集,其中包含156小时的跨越100万小时,涵盖8263个独特实体。我们使数据集公开可用。使用此数据集,我们提出了一种新颖的方法,NewsKVQA(知识吸引的新闻视频问题回答),该方法对文本多选择问题,视频,其成绩单和知识库进行多模式推断,并提供了强大的基线。
Answering questions in the context of videos can be helpful in video indexing, video retrieval systems, video summarization, learning management systems and surveillance video analysis. Although there exists a large body of work on visual question answering, work on video question answering (1) is limited to domains like movies, TV shows, gameplay, or human activity, and (2) is mostly based on common sense reasoning. In this paper, we explore a new frontier in video question answering: answering knowledge-based questions in the context of news videos. To this end, we curate a new dataset of 12K news videos spanning across 156 hours with 1M multiple-choice question-answer pairs covering 8263 unique entities. We make the dataset publicly available. Using this dataset, we propose a novel approach, NEWSKVQA (Knowledge-Aware News Video Question Answering) which performs multi-modal inferencing over textual multiple-choice questions, videos, their transcripts and knowledge base, and presents a strong baseline.