论文标题

大型图像数据集:计算机视觉的Pyrrhic胜利?

Large image datasets: A pyrrhic win for computer vision?

论文作者

Prabhu, Vinay Uday, Birhane, Abeba

论文摘要

在本文中,我们研究了大规模视觉数据集的有问题的实践和后果。我们研究了广泛的问题,例如同意和正义问题,以及特定的问题,例如在数据集中可靠地包含色情图像。以Imagenet-ILSVRC-2012数据集为例,我们执行基于横截面模型的定量人口普查,涵盖年龄,性别,NSFW内容评分,班级准确性,人性化精度,人性化度分析以及图像分类信息的统计学范围和统计范围的图像分类信息的词性。然后,我们使用人口普查来帮助手工策划图像在Imagenet-ILSVRC-2012数据集中的查找桌,该数据集属于可见色情内容的类别:以非同意设置(Up-Skirt)(Up-Skirt),Beach Voyeuristic拍摄,并进行了私人部件。我们由于不批判性和不良的数据集策划实践而广泛地调查了社会和个人面临的伤害和威胁的格局。然后,我们提出了可能的纠正和批评这些可能的课程。我们已经适当开源了所有代码和这项工作中生成的人口普查元数据,以供计算机视觉社区建立。通过揭示威胁的严重性,我们的希望是激励大规模数据集策划过程的强制性机构审查委员会(IRB)的构成。

In this paper we investigate problematic practices and consequences of large scale vision datasets. We examine broad issues such as the question of consent and justice as well as specific concerns such as the inclusion of verifiably pornographic images in datasets. Taking the ImageNet-ILSVRC-2012 dataset as an example, we perform a cross-sectional model-based quantitative census covering factors such as age, gender, NSFW content scoring, class-wise accuracy, human-cardinality-analysis, and the semanticity of the image class information in order to statistically investigate the extent and subtleties of ethical transgressions. We then use the census to help hand-curate a look-up-table of images in the ImageNet-ILSVRC-2012 dataset that fall into the categories of verifiably pornographic: shot in a non-consensual setting (up-skirt), beach voyeuristic, and exposed private parts. We survey the landscape of harm and threats both society broadly and individuals face due to uncritical and ill-considered dataset curation practices. We then propose possible courses of correction and critique the pros and cons of these. We have duly open-sourced all of the code and the census meta-datasets generated in this endeavor for the computer vision community to build on. By unveiling the severity of the threats, our hope is to motivate the constitution of mandatory Institutional Review Boards (IRB) for large scale dataset curation processes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源