共同信息学习分类器：培训深度学习分类系统的信息理论观点

论文标题

共同信息学习分类器：培训深度学习分类系统的信息理论观点

Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

论文作者

Yi, Jirong, Zhang, Qiaosheng, Chen, Zhen, Liu, Qiao, Shao, Wei

论文摘要

据报道，深度学习系统可以在许多应用程序中实现最先进的性能，一个关键是在基准数据集中存在训练有素的分类器。作为主流损失函数，横熵很容易导致我们找到表现出严重过度拟合行为的模型。在本文中，我们表明现有的交叉熵损失最小化问题基本上了解了数据集的基础数据分布的标签条件熵（CE）。但是，以这种方式学习的CE并不能很好地表征标签和输入共享的信息。在本文中，我们提出了一个共同的信息学习框架，在该框架中，我们通过学习标签和输入之间的相互信息来训练深层神经网络分类器。从理论上讲，我们在相互信息方面给出了种群分类错误的下限。此外，我们在$ \ mathbb {r}^n $中的混凝土二进制分类数据模型以及在这种情况下的错误概率下限中得出了相互信息的下限和上限。从经验上讲，我们在几个基准数据集上进行了广泛的实验，以支持我们的理论。相互学习的分类器（MILC）比有条件的熵学分类器（CELC）取得更好的概括性能，其改进的测试准确性可能超过10 \％。

Deep learning systems have been reported to achieve state-of-the-art performances in many applications, and a key is the existence of well trained classifiers on benchmark datasets. As a main-stream loss function, the cross entropy can easily lead us to find models which demonstrate severe overfitting behavior. In this paper, we show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy (CE) of the underlying data distribution of the dataset. However, the CE learned in this way does not characterize well the information shared by the label and the input. In this paper, we propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input. Theoretically, we give the population classification error lower bound in terms of the mutual information. In addition, we derive the mutual information lower and upper bounds for a concrete binary classification data model in $\mathbb{R}^n$, and also the error probability lower bound in this scenario. Empirically, we conduct extensive experiments on several benchmark datasets to support our theory. The mutual information learned classifiers (MILCs) achieve far better generalization performances than the conditional entropy learned classifiers (CELCs) with an improvement which can exceed more than 10\% in testing accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题