固定大小的对象编码以进行视觉关系检测

论文标题

固定大小的对象编码以进行视觉关系检测

Fixed-size Objects Encoding for Visual Relationship Detection

论文作者

Pan, Hengyue, Niu, Xin, Li, Rongchun, Shen, Siqi, Dou, Yong

论文摘要

在本文中，我们提出了一种固定尺寸的对象编码方法（FOE-VRD），以提高视觉关系检测任务的性能。与以前的方法相比，Foe-VRD具有重要的功能，即，它使用一个固定尺寸的向量来编码每个输入图像中的所有对象，以帮助关系检测过程。首先，我们使用常规的卷积神经网络作为特征提取器来生成输入图像的高级特征。然后，对于输入映像中的每个关系三重态，即$ <$ <$ bocebs-predicate-object $> $，我们应用ROI-Pooling在功能图上获取两个区域的功能向量，该功能映射与对应于主题和对象的界限相对应。除了主题和对象外，我们的分析暗示谓词分类的结果也可能与输入图像中的其余对象有关（我们称它们为背景对象）。由于不同图像和计算成本中的背景对象数量变化，因此我们无法使用ROI池技术一对一地为其生成特征向量。取而代之的是，我们提出了一种新的方法，可以使用一个固定大小的向量（即FBE向量）编码每个图像中的所有背景对象。通过将我们在上面生成的3个向量加成，我们可以使用一个固定大小的向量成功编码对象。然后将生成的特征向量输入到完全连接的神经网络中，以获得谓词分类结果。 VRD数据库（整个集合和零拍测试）上的实验结果表明，所提出的方法在谓词分类和关系检测方面都很好地工作。

In this paper, we propose a fixed-size object encoding method (FOE-VRD) to improve performance of visual relationship detection tasks. Comparing with previous methods, FOE-VRD has an important feature, i.e., it uses one fixed-size vector to encoding all objects in each input image to assist the process of relationship detection. Firstly, we use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet in input images, i.e., $<$subject-predicate-object$>$, we apply ROI-pooling to get feature vectors of two regions on the feature maps that corresponding to bounding boxes of the subject and object. Besides the subject and object, our analysis implies that the results of predicate classification may also related to the rest objects in input images (we call them background objects). Due to the variable number of background objects in different images and computational costs, we cannot generate feature vectors for them one-by-one by using ROI pooling technique. Instead, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i.e., FBE vector). By concatenating the 3 vectors we generate above, we successfully encode the objects using one fixed-size vector. The generated feature vector is then feed into a fully connected neural network to get predicate classification results. Experimental results on VRD database (entire set and zero-shot tests) show that the proposed method works well on both predicate classification and relationship detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题