论文标题
解开信息瓶颈:深度学习中的信息理论目标
Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning
论文作者
论文摘要
信息瓶颈原则既提供了一种机制来解释深度神经网络的训练和概括,也提供了训练模型的正规化目标。但是,文献中提出了多种竞争目标,并且这些目标中使用的信息理论数量很难计算出大型深神经网络,这反过来又限制了它们作为培训目标的使用。在这项工作中,我们审查了这些数量,并比较和统一先前提出的目标,这使我们能够开发出对优化更友好的替代目标,而无需依赖繁琐的工具,例如密度估计。我们发现这些替代目标使我们能够将信息瓶颈应用于现代神经网络体系结构。我们展示了有关MNIST,CIFAR-10和IMAGENETTE的见解,并具有现代DNN体系结构(RESNETS)。
The Information Bottleneck principle offers both a mechanism to explain how deep neural networks train and generalize, as well as a regularized objective with which to train models. However, multiple competing objectives are proposed in the literature, and the information-theoretic quantities used in these objectives are difficult to compute for large deep neural networks, which in turn limits their use as a training objective. In this work, we review these quantities and compare and unify previously proposed objectives, which allows us to develop surrogate objectives more friendly to optimization without relying on cumbersome tools such as density estimation. We find that these surrogate objectives allow us to apply the information bottleneck to modern neural network architectures. We demonstrate our insights on MNIST, CIFAR-10 and Imagenette with modern DNN architectures (ResNets).