论文标题
基于原型的神经元功能的胜利者 - 全部神经网络的功能的解释
Prototype-based interpretation of the functionality of neurons in winner-take-all neural networks
论文作者
论文摘要
基于原型的学习(PBL)使用基于最小欧几里得距离(ED-WTA)的获奖者 - 全能网络(WTA)网络是多类分类的直观方法。通过构建有意义的班级中心,PBL比基于最大内部产品(IP-WTA)的基于超平面的学习(HBL)方法提供了更高的解释性和概括,并且可以有效地检测并拒绝不属于任何类别的样本。在本文中,我们首先从代表性的角度证明了IP-WTA和ED-WTA的等效性。然后,我们表明,天真地使用这种等价会导致不直觉的ED-WTA网络,其中中心与它们所代表的数据具有很高的距离。我们提出了$ \ pm $ ed-wta,将每个神经元建模为两个原型:一个代表该神经元建模的样品的阳性原型,以及代表训练过程中该神经元错误赢得的样品的阴性原型。我们为$ \ pm $ ed-wta网络提出了一种新颖的培训算法,该算法巧妙地切换了更新正面和负面原型,对于可解释的原型的出现至关重要。出乎意料的是,我们观察到每个神经元的负原型与阳性的原型不同。该观察结果背后的理由是,与原型误认为的训练数据确实与之相似。本文的主要发现是对神经元功能的解释是计算距离与正原型和负原型之间的差异,这与BCM理论一致。在我们的实验中,我们表明提出的$ \ pm $ ed-wta方法构建了高度可解释的原型,可以成功地用于检测异常和对抗性示例。
Prototype-based learning (PbL) using a winner-take-all (WTA) network based on minimum Euclidean distance (ED-WTA) is an intuitive approach to multiclass classification. By constructing meaningful class centers, PbL provides higher interpretability and generalization than hyperplane-based learning (HbL) methods based on maximum Inner Product (IP-WTA) and can efficiently detect and reject samples that do not belong to any classes. In this paper, we first prove the equivalence of IP-WTA and ED-WTA from a representational point of view. Then, we show that naively using this equivalence leads to unintuitive ED-WTA networks in which the centers have high distances to data that they represent. We propose $\pm$ED-WTA which models each neuron with two prototypes: one positive prototype representing samples that are modeled by this neuron and a negative prototype representing the samples that are erroneously won by that neuron during training. We propose a novel training algorithm for the $\pm$ED-WTA network, which cleverly switches between updating the positive and negative prototypes and is essential to the emergence of interpretable prototypes. Unexpectedly, we observed that the negative prototype of each neuron is indistinguishably similar to the positive one. The rationale behind this observation is that the training data that are mistaken with a prototype are indeed similar to it. The main finding of this paper is this interpretation of the functionality of neurons as computing the difference between the distances to a positive and a negative prototype, which is in agreement with the BCM theory. In our experiments, we show that the proposed $\pm$ED-WTA method constructs highly interpretable prototypes that can be successfully used for detecting outlier and adversarial examples.