论文标题
基于视频的人重新识别的多粒性图池
Multi-Granularity Graph Pooling for Video-based Person Re-Identification
论文作者
论文摘要
基于视频的人重新识别(REID)旨在识别多个非重叠摄像机的给定的行人视频序列。 为了汇总视频样本的时间和空间特征,引入了图神经网络(GNN)。 但是,现有的基于图的模型(例如STGCN)在节点功能上执行\ textIt {mean}/\ textit {max boming}以获取图表表示,该图表忽略了图形拓扑和节点的重要性。 在本文中,我们建议图形池网络(GPNET)学习视频检索的多粒度图表示,其中实现了\ textit {Graph boming layer},以简化图形。 我们首先构建了一个多粒图,其节点特征是用主链学习的表示图像嵌入,并且在颞和欧几里得邻域节点之间建立了边缘。 然后,我们实现多个图形卷积层以执行图表上的邻域聚集。 为了使图下示例,我们提出了一个多头全注意图池(MHFAPOOL)层,该图集合了现有节点群集和节点选择池的优势。 具体而言,MHFAPOOL将全部注意矩阵的主要特征向量作为聚合系数,以涉及每个汇总节点中的全局图信息。 广泛的实验表明,我们的GPNET可以在四个广泛使用的数据集(即Mars,Dukemtmc-veneoreid,Ilids-VID和PRID-2011)上实现竞争结果。
The video-based person re-identification (ReID) aims to identify the given pedestrian video sequence across multiple non-overlapping cameras. To aggregate the temporal and spatial features of the video samples, the graph neural networks (GNNs) are introduced. However, existing graph-based models, like STGCN, perform the \textit{mean}/\textit{max pooling} on node features to obtain the graph representation, which neglect the graph topology and node importance. In this paper, we propose the graph pooling network (GPNet) to learn the multi-granularity graph representation for the video retrieval, where the \textit{graph pooling layer} is implemented to downsample the graph. We first construct a multi-granular graph, whose node features denote image embedding learned by backbone, and edges are established between the temporal and Euclidean neighborhood nodes. We then implement multiple graph convolutional layers to perform the neighborhood aggregation on the graphs. To downsample the graph, we propose a multi-head full attention graph pooling (MHFAPool) layer, which integrates the advantages of existing node clustering and node selection pooling methods. Specifically, MHFAPool takes the main eigenvector of full attention matrix as the aggregation coefficients to involve the global graph information in each pooled nodes. Extensive experiments demonstrate that our GPNet achieves the competitive results on four widely-used datasets, i.e., MARS, DukeMTMC-VideoReID, iLIDS-VID and PRID-2011.