论文标题
了解性别分类算法的公平性,性别率群体
Understanding Fairness of Gender Classification Algorithms Across Gender-Race Groups
论文作者
论文摘要
自动分类在许多领域中都有重要的应用,例如人口研究,执法,在线广告以及人类计算机的互动。最近的研究质疑这项技术在性别和种族中的公平性。具体而言,大多数研究提出了针对非裔美国人和妇女等肤色肤色的基于面部的性别分类系统较高错误率的关注。但是,迄今为止,大多数现有研究仅限于非裔美国人和高加索人。本文的目的是调查性别分类群体中性别分类算法的差异性能。为此,我们研究了(a)深度学习算法和(b)训练设置失衡的影响,是导致性别和种族中差异表现的潜在偏见来源。实验研究是对两个最新的大规模公开面部属性数据集进行的,即utkface和fairface。实验结果表明,具有体系结构差异的算法在性能方面有所不同,对特定性别竞赛组的一致性。例如,对于所有使用的算法,黑色女性(一般而言黑色种族)总是获得最低准确率。大多数时候,中东男性和拉丁美洲女性获得了更高的准确率。训练集的不平衡进一步扩大了所有性别竞赛组的不平等精度率的差距。使用面部标志的进一步研究表明,由于遗传和环境因素影响的骨骼结构引起的面部形态差异可能是黑人女性和黑人种族表现最少的原因。
Automated gender classification has important applications in many domains, such as demographic research, law enforcement, online advertising, as well as human-computer interaction. Recent research has questioned the fairness of this technology across gender and race. Specifically, the majority of the studies raised the concern of higher error rates of the face-based gender classification system for darker-skinned people like African-American and for women. However, to date, the majority of existing studies were limited to African-American and Caucasian only. The aim of this paper is to investigate the differential performance of the gender classification algorithms across gender-race groups. To this aim, we investigate the impact of (a) architectural differences in the deep learning algorithms and (b) training set imbalance, as a potential source of bias causing differential performance across gender and race. Experimental investigations are conducted on two latest large-scale publicly available facial attribute datasets, namely, UTKFace and FairFace. The experimental results suggested that the algorithms with architectural differences varied in performance with consistency towards specific gender-race groups. For instance, for all the algorithms used, Black females (Black race in general) always obtained the least accuracy rates. Middle Eastern males and Latino females obtained higher accuracy rates most of the time. Training set imbalance further widens the gap in the unequal accuracy rates across all gender-race groups. Further investigations using facial landmarks suggested that facial morphological differences due to the bone structure influenced by genetic and environmental factors could be the cause of the least performance of Black females and Black race, in general.