论文标题
合成数据集可靠地基准可推广的人重新识别吗?
Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification?
论文作者
论文摘要
最近的研究表明,在合成数据集培训的模型能够实现比在公共现实世界数据集中培训的概念性识别(GPREID)性能更好的能力。另一方面,由于现实世界中的REID数据集的局限性,将大规模合成数据集用作测试集对基准人REID算法也很重要且有趣。然而,这提出了一个关键的问题:合成数据集是否可靠地基准可推广的人重新识别?在文献中,没有证据表明这一点。为了解决这个问题,我们设计了一种称为成对排名分析(PRA)的方法,以定量测量排名相似性并执行相同分布的统计检验。具体而言,我们采用Kendall等级相关系数来评估不同数据集上算法排名之间的成对相似性值。然后,进行非参数的两样本Kolmogorov-Smirnov(KS)测试,以判断算法是否在合成数据集和现实世界数据集之间以及仅实际数据集中的算法排名相关性。我们进行了全面的实验,具有十种代表性算法,三个流行的真实人REID数据集,以及三个最近发布的大规模合成数据集。通过设计的成对排名分析和全面的评估,我们得出结论,可以可靠地使用最近的大型合成数据集克隆人来基准GPREID,从统计学上讲与现实世界数据集相同。因此,本研究保证了用于源训练集和目标测试集的合成数据集的使用,而实际上没有现实世界监视数据的隐私问题。此外,本文中的研究还可能激发合成数据集的未来设计。
Recent studies show that models trained on synthetic datasets are able to achieve better generalizable person re-identification (GPReID) performance than that trained on public real-world datasets. On the other hand, due to the limitations of real-world person ReID datasets, it would also be important and interesting to use large-scale synthetic datasets as test sets to benchmark person ReID algorithms. Yet this raises a critical question: is synthetic dataset reliable for benchmarking generalizable person re-identification? In the literature there is no evidence showing this. To address this, we design a method called Pairwise Ranking Analysis (PRA) to quantitatively measure the ranking similarity and perform the statistical test of identical distributions. Specifically, we employ Kendall rank correlation coefficients to evaluate pairwise similarity values between algorithm rankings on different datasets. Then, a non-parametric two-sample Kolmogorov-Smirnov (KS) test is performed for the judgement of whether algorithm ranking correlations between synthetic and real-world datasets and those only between real-world datasets lie in identical distributions. We conduct comprehensive experiments, with ten representative algorithms, three popular real-world person ReID datasets, and three recently released large-scale synthetic datasets. Through the designed pairwise ranking analysis and comprehensive evaluations, we conclude that a recent large-scale synthetic dataset ClonedPerson can be reliably used to benchmark GPReID, statistically the same as real-world datasets. Therefore, this study guarantees the usage of synthetic datasets for both source training set and target testing set, with completely no privacy concerns from real-world surveillance data. Besides, the study in this paper might also inspire future designs of synthetic datasets.