超出准确性：通过测量误差一致性来量化CNN和人类的试验行为

论文标题

超出准确性：通过测量误差一致性来量化CNN和人类的试验行为

Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency

论文作者

Geirhos, Robert, Meding, Kristof, Wichmann, Felix A.

论文摘要

认知科学和行为神经科学以及机器学习和人工智能研究中的一个核心问题是确定两个或多个决策者（无论是大脑还是算法）是否使用相同的策略。仅准确性就无法区分策略：两个系统可以通过截然不同的策略实现相似的准确性。如果两个系统接近天花板性能，例如卷积神经网络（CNN）和人类在视觉对象识别上，则需要脱离准确性。在这里，我们介绍逐审误差一致性，这是一种定量分析，用于测量两个决策系统是否系统地在同一输入上犯错误。在逐审的基础上犯下一致的错误是决策者之间类似处理策略的必要条件。我们的分析适用于将算法与算法，人类与人类和人类算法进行比较。在将误差一致性应用于对象识别时，我们获得了三个主要发现：（1.）与架构无关，CNN与彼此都非常一致。（2.）但是，CNN和人类观察者之间的一致性仅远远超过偶然的期望 - 表明人类和CNN可能正在实施截然不同的策略。（3.）Cornet-S是一种经常性模型，称为“灵长类动物腹侧视觉流的当前最佳模型”，无法捕获人类行为数据的基本特征，并且在我们的分析中基本上像标准的纯粹的FeedForward Resnet-50一样行为。综上所述，错误一致性分析表明，人类和机器视觉所使用的策略仍然非常不同 - 但是我们设想了我们的通用错误一致性分析，可以作为量化未来进步的富有成果的工具。

A central problem in cognitive science and behavioural neuroscience as well as in machine learning and artificial intelligence research is to ascertain whether two or more decision makers (be they brains or algorithms) use the same strategy. Accuracy alone cannot distinguish between strategies: two systems may achieve similar accuracy with very different strategies. The need to differentiate beyond accuracy is particularly pressing if two systems are near ceiling performance, like Convolutional Neural Networks (CNNs) and humans on visual object recognition. Here we introduce trial-by-trial error consistency, a quantitative analysis for measuring whether two decision making systems systematically make errors on the same inputs. Making consistent errors on a trial-by-trial basis is a necessary condition for similar processing strategies between decision makers. Our analysis is applicable to compare algorithms with algorithms, humans with humans, and algorithms with humans. When applying error consistency to object recognition we obtain three main findings: (1.) Irrespective of architecture, CNNs are remarkably consistent with one another. (2.) The consistency between CNNs and human observers, however, is little above what can be expected by chance alone -- indicating that humans and CNNs are likely implementing very different strategies. (3.) CORnet-S, a recurrent model termed the "current best model of the primate ventral visual stream", fails to capture essential characteristics of human behavioural data and behaves essentially like a standard purely feedforward ResNet-50 in our analysis. Taken together, error consistency analysis suggests that the strategies used by human and machine vision are still very different -- but we envision our general-purpose error consistency analysis to serve as a fruitful tool for quantifying future progress.

下载PDF全文

下载文献需遵守相关版权规定

论文标题