论文标题
基本视觉任务的绩效评估如何值得信赖?
How Trustworthy are Performance Evaluations for Basic Vision Tasks?
论文作者
论文摘要
本文研究了涉及对象集的基本视觉任务任务的性能评估标准,即对象检测,实例级分段和多对象跟踪。现有标准的算法排名可能会以不同的参数选择波动,例如联合(IOU)阈值的交叉点使他们的评估不可靠。更重要的是,没有手段来验证我们是否可以相信标准的评估。这项工作提出了对性能标准的可信度概念,该概念需要(i)对可靠性的参数鲁棒性,(ii)理智测试中的上下文意义,以及(iii)与数学要求(例如度量属性)一致的一致性。我们观察到这些要求被许多广泛使用的标准忽略了,并使用一组形状的指标探索替代标准。我们还根据建议的值得信赖的要求评估所有这些标准。
This paper examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking. The rankings of algorithms by an existing criterion can fluctuate with different choices of parameters, e.g. Intersection over Union (IoU) threshold, making their evaluations unreliable. More importantly, there is no means to verify whether we can trust the evaluations of a criterion. This work suggests a notion of trustworthiness for performance criteria, which requires (i) robustness to parameters for reliability, (ii) contextual meaningfulness in sanity tests, and (iii) consistency with mathematical requirements such as the metric properties. We observe that these requirements were overlooked by many widely-used criteria, and explore alternative criteria using metrics for sets of shapes. We also assess all these criteria based on the suggested requirements for trustworthiness.