深度神经网络中的信息流

论文标题

深度神经网络中的信息流

Information Flow in Deep Neural Networks

论文作者

Shwartz-Ziv, Ravid

论文摘要

尽管深层神经网络取得了巨大成功，但对它们的工作方式或结构化的理论没有全面的理解。结果，深网通常被视为具有不清楚的解释和可靠性的黑匣子。了解深神经网络的性能是最大的科学挑战之一。这项工作旨在应用从信息理论到深度学习模型的原理和技术，以提高我们的理论理解和设计更好的算法。我们首先描述了深度学习的信息理论方法。然后，我们建议使用信息瓶颈（IB）理论来解释深度学习系统。用于分析网络的新型范式阐明了它们的分层结构，概括能力和学习动力学。后来，我们讨论将IB应用于深神经网络的最具挑战性的问题之一 - 估计共同信息。最近的理论发展（例如神经切线内核（NTK）框架）用于研究概括信号。在我们的研究中，我们获得了许多信息理论量及其对无限宽神经网络的无限集合的范围的可访问计算。通过这些推导，我们可以确定压缩，概括和样本量与网络以及它们之间的相关方式。最后，我们介绍双信息瓶颈（Dualib）。这种新的信息理论框架可以通过仅在失真函数中切换术语来解决IB的一些缺点。 Dualib可以说明已知的数据功能，并使用它们对看不见的示例进行更好的预测。一个分析框架揭示了潜在的结构和最佳表示，并且使用深神经网络优化的变分框架验证了结果。

Although deep neural networks have been immensely successful, there is no comprehensive theoretical understanding of how they work or are structured. As a result, deep networks are often seen as black boxes with unclear interpretations and reliability. Understanding the performance of deep neural networks is one of the greatest scientific challenges. This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms. We first describe our information-theoretic approach to deep learning. Then, we propose using the Information Bottleneck (IB) theory to explain deep learning systems. The novel paradigm for analyzing networks sheds light on their layered structure, generalization abilities, and learning dynamics. We later discuss one of the most challenging problems of applying the IB to deep neural networks - estimating mutual information. Recent theoretical developments, such as the neural tangent kernel (NTK) framework, are used to investigate generalization signals. In our study, we obtained tractable computations of many information-theoretic quantities and their bounds for infinite ensembles of infinitely wide neural networks. With these derivations, we can determine how compression, generalization, and sample size pertain to the network and how they are related. At the end, we present the dual Information Bottleneck (dualIB). This new information-theoretic framework resolves some of the IB's shortcomings by merely switching terms in the distortion function. The dualIB can account for known data features and use them to make better predictions over unseen examples. An analytical framework reveals the underlying structure and optimal representations, and a variational framework using deep neural network optimization validates the results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题