论文标题
多通道CNN,并注意文本分类
Multichannel CNN with Attention for Text Classification
论文作者
论文摘要
近年来,基于神经网络的方法显示出句子建模的显着潜力。有两个主要的神经网络结构:复发性神经网络(RNN)和卷积神经网络(CNN)。 RNN可以捕获长期依赖性并将以前信息的语义存储在固定尺寸的向量中。但是,RNN是一个有偏见的模型,其提取全局语义的能力受固定尺寸向量的限制。另外,CNN能够通过使用卷积过滤器来捕获文本的n-gram特征。但是卷积过滤的宽度限制了其性能。为了结合两种网络的优势并减轻其缺点,本文提出了基于注意力的多通道卷积神经网络(AMCNN)进行文本分类。 AMCNN利用双向长期记忆将单词的历史记录和未来信息编码为高维表示,以便可以完全表达句子的前后信息。然后,应用标量的注意力和矢量关注以获得多通道表示。标量的注意力可以计算单词级别的重要性,并且矢量关注可以计算特征级别的重要性。在分类任务中,AMCNN使用CNN结构来对标量和矢量注意机制产生的表示形式进行CPTURE单词关系,而不是计算加权总和。它可以有效地提取文本的n-gram特征。基准数据集上的实验结果表明,AMCNN比最先进的方法更好。此外,可视化结果验证了多通道表示的语义丰富度。
Recent years, the approaches based on neural networks have shown remarkable potential for sentence modeling. There are two main neural network structures: recurrent neural network (RNN) and convolution neural network (CNN). RNN can capture long term dependencies and store the semantics of the previous information in a fixed-sized vector. However, RNN is a biased model and its ability to extract global semantics is restricted by the fixed-sized vector. Alternatively, CNN is able to capture n-gram features of texts by utilizing convolutional filters. But the width of convolutional filters restricts its performance. In order to combine the strengths of the two kinds of networks and alleviate their shortcomings, this paper proposes Attention-based Multichannel Convolutional Neural Network (AMCNN) for text classification. AMCNN utilizes a bi-directional long short-term memory to encode the history and future information of words into high dimensional representations, so that the information of both the front and back of the sentence can be fully expressed. Then the scalar attention and vectorial attention are applied to obtain multichannel representations. The scalar attention can calculate the word-level importance and the vectorial attention can calculate the feature-level importance. In the classification task, AMCNN uses a CNN structure to cpture word relations on the representations generated by the scalar and vectorial attention mechanism instead of calculating the weighted sums. It can effectively extract the n-gram features of the text. The experimental results on the benchmark datasets demonstrate that AMCNN achieves better performance than state-of-the-art methods. In addition, the visualization results verify the semantic richness of multichannel representations.