论文标题
通过构建熟悉的概念来解释分类器
Explaining Classifiers by Constructing Familiar Concepts
论文作者
论文摘要
在深度学习中解释大量神经元很困难。我们提出的“分类器 - 编码器”架构(Cladec)促进了对神经元或其子集的任意层输出的理解。它使用的解码器将给定神经元的难以理解的表示变成了与人类熟悉的领域更相似的表示形式。在图像识别问题中,可以识别图层通过将Cladec的重建图像与传统自动编码器(AE)的cladec进行对比的信息(或概念)所保持的信息。 Cladec的扩展允许交易的可理解性和忠诚度。我们使用卷积神经网络评估图像分类的方法。我们表明,使用分类器中的编码捕获比常规AE的相关分类信息重建可视化。尽管AE包含有关原始输入的更多信息,但这仍然存在。我们的用户研究强调,即使是非专家也可以识别出与分类器相关(或无关)的图像中包含的各种概念。我们还将基于显着性的方法进行比较,该方法专注于像素相关性而不是概念。我们表明,尽管结果取决于分类器架构,但Cladec倾向于强调分类更相关的输入领域。代码在\ url {https://github.com/johntailor/cladec}
Interpreting a large number of neurons in deep learning is difficult. Our proposed `CLAssifier-DECoder' architecture (ClaDec) facilitates the understanding of the output of an arbitrary layer of neurons or subsets thereof. It uses a decoder that transforms the incomprehensible representation of the given neurons to a representation that is more similar to the domain a human is familiar with. In an image recognition problem, one can recognize what information (or concepts) a layer maintains by contrasting reconstructed images of ClaDec with those of a conventional auto-encoder(AE) serving as reference. An extension of ClaDec allows trading comprehensibility and fidelity. We evaluate our approach for image classification using convolutional neural networks. We show that reconstructed visualizations using encodings from a classifier capture more relevant classification information than conventional AEs. This holds although AEs contain more information on the original input. Our user study highlights that even non-experts can identify a diverse set of concepts contained in images that are relevant (or irrelevant) for the classifier. We also compare against saliency based methods that focus on pixel relevance rather than concepts. We show that ClaDec tends to highlight more relevant input areas to classification though outcomes depend on classifier architecture. Code is at \url{https://github.com/JohnTailor/ClaDec}