论文标题

视频问题回答屏幕截图教程

Video Question Answering on Screencast Tutorials

论文作者

Zhao, Wentian, Kim, Seokhwan, Xu, Ning, Jin, Hailin

论文摘要

本文在屏幕截图教程上提出了一个新的视频问答。我们介绍了一个数据集,包括来自软件的教程视频的问题,答案和上下文三元。与其他视频问答作品不同,我们数据集中的所有答案都基于域知识库。单次识别算法旨在提取视觉提示,这有助于提高视频问题回答的性能。我们还根据数据集的视频上下文的各个方面提出了几个基线神经网络架构。实验结果表明,我们提出的模型通过合并多模式环境和域知识来显着改善问题答案性能。

This paper presents a new video question answering task on screencast tutorials. We introduce a dataset including question, answer and context triples from the tutorial videos for a software. Unlike other video question answering works, all the answers in our dataset are grounded to the domain knowledge base. An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering. We also propose several baseline neural network architectures based on various aspects of video contexts from the dataset. The experimental results demonstrate that our proposed models significantly improve the question answering performances by incorporating multi-modal contexts and domain knowledge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源