关于神经网络中特征学习的理论分析：从输入和优于固定特征的优势的出现

论文标题

关于神经网络中特征学习的理论分析：从输入和优于固定特征的优势的出现

A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features

论文作者

Shi, Zhenmei, Wei, Junyi, Liang, Yingyu

论文摘要

神经网络的一个重要特征是他们能够学习具有有效预测功能的输入数据表示的能力，这被认为是其出色的经验表现的关键因素。为了更好地理解神经网络中特征学习的来源和好处，我们考虑了由实用数据促进的学习问题，其中标签是由一组相关模式确定的，并且这些输入是从这些模式以及某些背景模式以及某些背景模式产生的。我们证明，通过梯度下降培训的神经网络可以在这些问题上成功。成功依赖于有效特征的出现和改进，这些特征是通过利用数据（尤其是输入分布的结构）在指数中有效地学习的。相比之下，多项式大小的数据独立特征上没有线性模型可以作为良好的错误学习。此外，如果去除了特定的输入结构，则统计查询模型中的多项式算法甚至可以弱地学习。这些结果提供了理论上的证据表明，神经网络中的特征学习在很大程度上取决于输入结构，并导致卓越的性能。我们关于合成和实际数据的初步实验结果也提供了积极的支持。

An important characteristic of neural networks is their ability to learn representations of the input data with effective features for prediction, which is believed to be a key factor to their superior empirical performance. To better understand the source and benefit of feature learning in neural networks, we consider learning problems motivated by practical data, where the labels are determined by a set of class relevant patterns and the inputs are generated from these along with some background patterns. We prove that neural networks trained by gradient descent can succeed on these problems. The success relies on the emergence and improvement of effective features, which are learned among exponentially many candidates efficiently by exploiting the data (in particular, the structure of the input distribution). In contrast, no linear models on data-independent features of polynomial sizes can learn to as good errors. Furthermore, if the specific input structure is removed, then no polynomial algorithm in the Statistical Query model can learn even weakly. These results provide theoretical evidence showing that feature learning in neural networks depends strongly on the input structure and leads to the superior performance. Our preliminary experimental results on synthetic and real data also provide positive support.

下载PDF全文

下载文献需遵守相关版权规定

论文标题