构建视觉关系的真实性数据集

论文标题

构建视觉关系的真实性数据集

Constructing a Visual Relationship Authenticity Dataset

论文作者

Chu, Chenhui, Takebayashi, Yuto, Vipul, Mishra, Nakashima, Yuta

论文摘要

视觉关系表示图像中两个对象之间的关系，这可以表示为（主题；谓词；对象）的三联体。视觉关系检测对于图像中的场景理解至关重要。现有的视觉关系检测数据集仅包含正确描述图像中内容的真实关系。但是，区分虚假的视觉关系和真实关系对于图像理解和扎根的自然语言处理也至关重要。在本文中，我们构建了一个视觉关系真实性数据集，其中所有对象之间的真实和错误关系都出现在Flickr30k实体中的字幕中图像标题数据集中的标题中。该数据集可在https://github.com/codecreator2053/vr_classifieddataset上获得。我们希望该数据集可以促进有关视觉和语言理解的研究。

A visual relationship denotes a relationship between two objects in an image, which can be represented as a triplet of (subject; predicate; object). Visual relationship detection is crucial for scene understanding in images. Existing visual relationship detection datasets only contain true relationships that correctly describe the content in an image. However, distinguishing false visual relationships from true ones is also crucial for image understanding and grounded natural language processing. In this paper, we construct a visual relationship authenticity dataset, where both true and false relationships among all objects appeared in the captions in the Flickr30k entities image caption dataset are annotated. The dataset is available at https://github.com/codecreator2053/VR_ClassifiedDataset. We hope that this dataset can promote the study on both vision and language understanding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题