迈向包容性HRI：使用SIM2Real解决情绪表达识别中代表性不足

论文标题

迈向包容性HRI：使用SIM2Real解决情绪表达识别中代表性不足

Towards Inclusive HRI: Using Sim2Real to Address Underrepresentation in Emotion Expression Recognition

论文作者

Akhyani, Saba, Boroujeni, Mehryar Abbasi, Chen, Mo, Lim, Angelica

论文摘要

与人类相互作用的机器人和人造代理应该能够在没有偏见和不平等的情况下这样做，但是众所周知，面部感知系统对某些人来说比其他人的工作更差。在我们的工作中，我们旨在建立一个可以以更透明和包容的方式感知人类的系统。具体而言，我们专注于对人脸的动态表达，由于隐私问题以及面部本质上可识别的事实，很难为广泛的人收集。此外，从互联网收集的数据集不一定代表一般人群。我们通过提供SIM2REAL方法来解决这个问题，在该方法中，我们使用一套3D模拟的人类模型，使我们能够创建一个可审核的合成数据集，覆盖1）在六个基本情绪之外代表性的面部表情不足，例如混乱； 2）种族或性别少数群体； 3）机器人在现实世界中可能遇到人类的广泛观看角度。通过增强包含包含4536个样本的合成数据集的123个样本的小型动态情感表达数据集，我们在自己的数据集上的准确性提高了15％，而在外部基准数据集上的精度为11％，与没有合成训练数据的同一模型结构的性能相比。我们还表明，当体系结构的特征提取权重从头开始训练时，这一额外的步骤专门针对种族少数群体的准确性。

Robots and artificial agents that interact with humans should be able to do so without bias and inequity, but facial perception systems have notoriously been found to work more poorly for certain groups of people than others. In our work, we aim to build a system that can perceive humans in a more transparent and inclusive manner. Specifically, we focus on dynamic expressions on the human face, which are difficult to collect for a broad set of people due to privacy concerns and the fact that faces are inherently identifiable. Furthermore, datasets collected from the Internet are not necessarily representative of the general population. We address this problem by offering a Sim2Real approach in which we use a suite of 3D simulated human models that enables us to create an auditable synthetic dataset covering 1) underrepresented facial expressions, outside of the six basic emotions, such as confusion; 2) ethnic or gender minority groups; and 3) a wide range of viewing angles that a robot may encounter a human in the real world. By augmenting a small dynamic emotional expression dataset containing 123 samples with a synthetic dataset containing 4536 samples, we achieved an improvement in accuracy of 15% on our own dataset and 11% on an external benchmark dataset, compared to the performance of the same model architecture without synthetic training data. We also show that this additional step improves accuracy specifically for racial minorities when the architecture's feature extraction weights are trained from scratch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题