论文标题
学习稀疏规则的信息理论限制
Information theoretic limits of learning a sparse rule
论文作者
论文摘要
我们考虑了信号的非零组件的数量和可访问的数据点相对于信号的大小,我们考虑了通用的线性模型。当系统大小生长到无穷大时,我们证明了每个样品的渐近互信息的变异公式。当信号条目具有有限支持的离散分布时,该结果使我们能够为贝叶斯估计器的最小均方误差(MMSE)得出一个表达式。我们发现,对于此类信号和合适的稀疏性和采样率的消失量表,MMSE的分段常数是非插入的。在特定的情况下,MMSE甚至显示出全或全部的相变,也就是说,在关键采样速率下,MMSE从最大值急剧跳至零。先前已显示出全有或全无的现象发生在高维线性回归中。我们的分析超出了线性案例,并且适用于在教师学生的情况下学习具有一般激活功能的感知术的权重。特别是,我们讨论了一个全或全无的现象,以使用一组sublinear的训练示例,以实现概括误差。
We consider generalized linear models in regimes where the number of nonzero components of the signal and accessible data points are sublinear with respect to the size of the signal. We prove a variational formula for the asymptotic mutual information per sample when the system size grows to infinity. This result allows us to derive an expression for the minimum mean-square error (MMSE) of the Bayesian estimator when the signal entries have a discrete distribution with finite support. We find that, for such signals and suitable vanishing scalings of the sparsity and sampling rate, the MMSE is nonincreasing piecewise constant. In specific instances the MMSE even displays an all-or-nothing phase transition, that is, the MMSE sharply jumps from its maximum value to zero at a critical sampling rate. The all-or-nothing phenomenon has previously been shown to occur in high-dimensional linear regression. Our analysis goes beyond the linear case and applies to learning the weights of a perceptron with general activation function in a teacher-student scenario. In particular, we discuss an all-or-nothing phenomenon for the generalization error with a sublinear set of training examples.