论文标题
将复发的神经网络分解为模块,以实现可重复性和替换
Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement
论文作者
论文摘要
我们可以采用经过训练以在语言之间翻译并增强其支持新自然语言而无需从头开始训练的新自然语言的经常性神经网络(RNN)吗?我们可以通过替换与错误行为相关的部分来解决RNN的错误行为吗?最新的关于将完全连接的神经网络(FCNN)和卷积神经网络(CNN)分解为模块的著作已经显示了以这种方式进行工程深模型的价值,这在传统的SE中是标准的,但对于深度学习模型而言。但是,先前的工作着重于基于图像的多类分类问题,并且由于(a)不同的层结构,(b)循环结构,(c)不同类型的输入输出架构以及(d)使用非线性和逻辑激活功能,因此无法应用于RNN。在这项工作中,我们提出了将RNN分解为模块的第一种方法。我们研究不同类型的RNN,即香草,LSTM和Gru。此外,我们展示了如何在各种情况下重复使用并替换此类RNN模块。我们针对每个数据集的5个规范数据集(即Math Qa,Brown Corpus,Wiki-Toxicity,Clinc Oos和Tatoeba)评估了我们的方法。我们发现分解训练的模型的成本很小(准确性:-0.6%,BLEU得分: +0.10%)。同样,分解的模块可以重复使用并替换而无需重新培训。
Can we take a recurrent neural network (RNN) trained to translate between languages and augment it to support a new natural language without retraining the model from scratch? Can we fix the faulty behavior of the RNN by replacing portions associated with the faulty behavior? Recent works on decomposing a fully connected neural network (FCNN) and convolutional neural network (CNN) into modules have shown the value of engineering deep models in this manner, which is standard in traditional SE but foreign for deep learning models. However, prior works focus on the image-based multiclass classification problems and cannot be applied to RNN due to (a) different layer structures, (b) loop structures, (c) different types of input-output architectures, and (d) usage of both nonlinear and logistic activation functions. In this work, we propose the first approach to decompose an RNN into modules. We study different types of RNNs, i.e., Vanilla, LSTM, and GRU. Further, we show how such RNN modules can be reused and replaced in various scenarios. We evaluate our approach against 5 canonical datasets (i.e., Math QA, Brown Corpus, Wiki-toxicity, Clinc OOS, and Tatoeba) and 4 model variants for each dataset. We found that decomposing a trained model has a small cost (Accuracy: -0.6%, BLEU score: +0.10%). Also, the decomposed modules can be reused and replaced without needing to retrain.