神经随机投影：从初始任务到输入相似性问题

论文标题

神经随机投影：从初始任务到输入相似性问题

Neural Random Projection: From the Initial Task To the Input Similarity Problem

论文作者

Savushkin, Alan, Benkovich, Nikita, Golubev, Dmitry

论文摘要

在本文中，我们提出了一种新颖的方法，用于使用训练有素的神经网络评估输入数据的相似性。与使用梯度表示表示的前一个方法相反，我们仅利用神经网络的最后一个隐藏层的输出，并且不使用向后步骤。该提出的技术明确考虑了初始任务，并大大减少了向量表示的大小以及计算时间。关键点是层之间信息损失的最小化。通常，神经网络会丢弃与问题无关的信息，这使得最后一个隐藏的层表示无用，用于输入相似性任务。在这项工作中，我们考虑了信息丢失的两个主要原因：神经元之间的相关性与最后一个隐藏层的大小不足。为了减少神经元之间的相关性，我们为每一层使用正交重量初始化，并修改损耗函数以确保训练过程中的权重正交。此外，我们表明激活功能可能会增加相关性。为了解决此问题，我们将修改后的批差正函数应用于辍学。使用正交重量矩阵使我们可以将这种神经网络视为随机投影方法的应用，并对最后一个隐藏层的大小进行下限估计。我们对MNIST和体格检查数据集进行实验。在这两个实验中，最初，我们将一组标签划分为两个不相交的子集，以训练神经网络作为二进制分类问题，然后使用此模型来衡量输入数据之间的相似性并定义隐藏的类。我们的实验结果表明，所提出的方法在输入相似性任务上实现了竞争成果，同时减少了计算时间和输入表示的大小。

In this paper, we propose a novel approach for implicit data representation to evaluate similarity of input data using a trained neural network. In contrast to the previous approach, which uses gradients for representation, we utilize only the outputs from the last hidden layer of a neural network and do not use a backward step. The proposed technique explicitly takes into account the initial task and significantly reduces the size of the vector representation, as well as the computation time. The key point is minimization of information loss between layers. Generally, a neural network discards information that is not related to the problem, which makes the last hidden layer representation useless for input similarity task. In this work, we consider two main causes of information loss: correlation between neurons and insufficient size of the last hidden layer. To reduce the correlation between neurons we use orthogonal weight initialization for each layer and modify the loss function to ensure orthogonality of the weights during training. Moreover, we show that activation functions can potentially increase correlation. To solve this problem, we apply modified Batch-Normalization with Dropout. Using orthogonal weight matrices allow us to consider such neural networks as an application of the Random Projection method and get a lower bound estimate for the size of the last hidden layer. We perform experiments on MNIST and physical examination datasets. In both experiments, initially, we split a set of labels into two disjoint subsets to train a neural network for binary classification problem, and then use this model to measure similarity between input data and define hidden classes. Our experimental results show that the proposed approach achieves competitive results on the input similarity task while reducing both computation time and the size of the input representation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题