论文标题
视频问题回答屏幕截图教程
Video Question Answering on Screencast Tutorials
论文作者
论文摘要
本文在屏幕截图教程上提出了一个新的视频问答。我们介绍了一个数据集,包括来自软件的教程视频的问题,答案和上下文三元。与其他视频问答作品不同,我们数据集中的所有答案都基于域知识库。单次识别算法旨在提取视觉提示,这有助于提高视频问题回答的性能。我们还根据数据集的视频上下文的各个方面提出了几个基线神经网络架构。实验结果表明,我们提出的模型通过合并多模式环境和域知识来显着改善问题答案性能。
This paper presents a new video question answering task on screencast tutorials. We introduce a dataset including question, answer and context triples from the tutorial videos for a software. Unlike other video question answering works, all the answers in our dataset are grounded to the domain knowledge base. An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering. We also propose several baseline neural network architectures based on various aspects of video contexts from the dataset. The experimental results demonstrate that our proposed models significantly improve the question answering performances by incorporating multi-modal contexts and domain knowledge.