人类视觉系统和CNN都可以在极端位移后支持强大的在线翻译公差

论文标题

人类视觉系统和CNN都可以在极端位移后支持强大的在线翻译公差

The human visual system and CNNs can both support robust online translation tolerance following extreme displacements

论文作者

Blything, Ryan, Biscione, Valerio, Vankov, Ivan I., Ludwig, Casimir J. H., Bowers, Jeffrey S.

论文摘要

视觉翻译公差是指我们在各种不同的视网膜位置识别物体的能力。尽管翻译也许是视觉系统需要应对的最简单的空间变换，但人类视觉系统可以在以前看不见的位置识别对象的程度尚不清楚，一些研究报告在10°上几乎完全不变性，而在视觉角度4°下的其他报告则零不变性。同样，关于视觉计算模型中的翻译公差程度以及人类和模型性能之间的匹配程度也存在混乱。在这里，我们报告了一系列眼神跟踪研究（总n = 70），表明在一个视网膜位置训练的新物体可以在翻译长达18°的转换后以高精度识别。我们还表明，当鉴定以在一系列位置对另一组刺激进行分类时，或者添加全球平均池（GAP）层以产生更大的接受场时，标准的深卷卷网络（DCNN）支持我们的发现。我们的发现为人类视力理论提供了强烈的限制，并有助于解释以前与CNN报道的不一致的发现。

Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10° and other reporting zero invariance at 4° of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here we report a series of eye-tracking studies (total N=70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18°. We also show that standard deep convolutional networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a Global Average Pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with CNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题