论文标题
图像表示以无监督的预训练包含人类偏见学到的图像表示形式
Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases
论文作者
论文摘要
机器学习的最新进展利用网络未标记的图像的大量数据集学习从图像分类到面部识别的任务的通用图像表示。但是,无监督的计算机视觉模型会自动学习可能具有有害下游效果的隐性模式和嵌入社会偏见吗?我们开发了一种新颖的方法,用于量化社会概念表示和图像中属性之间的偏见关联。我们发现,在Imagenet上训练的最先进的无监督模型,这是一种由Internet图像策划的流行基准图像数据集,可以自动学习种族,性别和交叉偏见。我们复制了8个以前记录的人类偏见,从社会心理学,从无害的昆虫和鲜花到潜在的有害和性别和性别。我们的结果与社会心理学的交叉偏见密切相匹配。在无监督的计算机视觉中,我们还第一次量化了关于体重,残疾和几个种族的隐性人类偏见。与在线图像数据集中的统计模式相比,我们的发现表明,机器学习模型可以自动从网络上刻板地描绘人们的方式中学习偏见。
Recent advances in machine learning leverage massive datasets of unlabeled images from the web to learn general-purpose image representations for tasks from image classification to face recognition. But do unsupervised computer vision models automatically learn implicit patterns and embed social biases that could have harmful downstream effects? We develop a novel method for quantifying biased associations between representations of social concepts and attributes in images. We find that state-of-the-art unsupervised models trained on ImageNet, a popular benchmark image dataset curated from internet images, automatically learn racial, gender, and intersectional biases. We replicate 8 previously documented human biases from social psychology, from the innocuous, as with insects and flowers, to the potentially harmful, as with race and gender. Our results closely match three hypotheses about intersectional bias from social psychology. For the first time in unsupervised computer vision, we also quantify implicit human biases about weight, disabilities, and several ethnicities. When compared with statistical patterns in online image datasets, our findings suggest that machine learning models can automatically learn bias from the way people are stereotypically portrayed on the web.