通过自我发挥的增强卷积来估算目光

论文标题

通过自我发挥的增强卷积来估算目光

On estimating gaze by self-attention augmented convolutions

论文作者

Lefundes, Gabriel, Oliveira, Luciano

论文摘要

3D凝视的估计与多个领域高度相关，包括但不限于交互式系统，专门的人类计算机界面和行为研究。尽管最近的深度学习方法提高了基于外观的凝视估计的准确性，但对于此特定任务的网络体系结构仍有改进的余地。因此，我们在这里提出了一个基于自我发项的增强卷积的新型网络架构，以提高培训较浅的残留网络期间学习功能的质量。理由是，通过学习全面图像中遥远区域之间的学习依赖性，自我发挥机制可以帮助超越更深层次的体系结构。这种机制还可以在注视回归之前从面部和眼睛图像得出的更好，更有意识的特征表示。我们称我们的框架Ares Gaze，它将我们注意的重新连接（ARES-14）探索为双卷积骨架。在我们的实验中，与MpiifaceGaze数据集上的最新方法相比，结果表明，平均角误差降低了2.38％，而在EYEDIAP数据集上的第二名。值得注意的是，我们提出的框架是唯一在两个数据集上同时达到高精度的框架。

Estimation of 3D gaze is highly relevant to multiple fields, including but not limited to interactive systems, specialized human-computer interfaces, and behavioral research. Although recently deep learning methods have boosted the accuracy of appearance-based gaze estimation, there is still room for improvement in the network architectures for this particular task. Therefore we propose here a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features during the training of a shallower residual network. The rationale is that self-attention mechanism can help outperform deeper architectures by learning dependencies between distant regions in full-face images. This mechanism can also create better and more spatially-aware feature representations derived from the face and eye images before gaze regression. We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones. In our experiments, results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, and a second-place on the EyeDiap data set. It is noteworthy that our proposed framework was the only one to reach high accuracy simultaneously on both data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题