论文标题
通过术语机制与词典的文本分类
Text Classification with Lexicon from PreAttention Mechanism
论文作者
论文摘要
全面且高质量的词典在传统的文本分类方法中起着至关重要的作用。它改善了语言知识的利用。尽管这对这项任务有帮助,但词典在最近的神经网络模型中很少关注。首先,获得高质量的词典并不容易。我们缺乏一种有效的自动词典提取方法,并且大多数词典是手工制作的,这对于大数据效率很低。而且,在神经网络中使用词典没有有效的方法。为了解决这些局限性,我们在本文中提出了一种用于文本分类的预注意学机制,该机制可以根据其在分类任务中的效果来学习不同单词的关注。注意力不同的单词可以形成域词典。在三个基准文本分类任务上进行的实验表明,与最新方法相比,我们的模型获得了竞争性结果。我们在斯坦福大型电影评论数据集上获得90.5%的精度,主观性数据集为82.3%,电影评论的精度为93.7%。并且与没有注意事前机制的文本分类模型相比,具有预科机制的人的精度提高了0.9%-2.4%,这证明了注意事前机制的有效性。此外,注意事前机制的性能很好,随后是不同类型的神经网络(例如卷积神经网络和长期短期记忆网络)。对于相同的数据集,当我们使用预注意机制获得注意值之后是不同的神经网络时,这些具有高注意值的单词具有很高的巧合,这证明了预注意学机制的多功能性和可移植性。我们可以通过注意值获得稳定的词典,这是一种鼓舞人心的信息提取方法。
A comprehensive and high-quality lexicon plays a crucial role in traditional text classification approaches. And it improves the utilization of the linguistic knowledge. Although it is helpful for the task, the lexicon has got little attention in recent neural network models. Firstly, getting a high-quality lexicon is not easy. We lack an effective automated lexicon extraction method, and most lexicons are hand crafted, which is very inefficient for big data. What's more, there is no an effective way to use a lexicon in a neural network. To address those limitations, we propose a Pre-Attention mechanism for text classification in this paper, which can learn attention of different words according to their effects in the classification tasks. The words with different attention can form a domain lexicon. Experiments on three benchmark text classification tasks show that our models get competitive result comparing with the state-of-the-art methods. We get 90.5% accuracy on Stanford Large Movie Review dataset, 82.3% on Subjectivity dataset, 93.7% on Movie Reviews. And compared with the text classification model without Pre-Attention mechanism, those with Pre-Attention mechanism improve by 0.9%-2.4% accuracy, which proves the validity of the Pre-Attention mechanism. In addition, the Pre-Attention mechanism performs well followed by different types of neural networks (e.g., convolutional neural networks and Long Short-Term Memory networks). For the same dataset, when we use Pre-Attention mechanism to get attention value followed by different neural networks, those words with high attention values have a high degree of coincidence, which proves the versatility and portability of the Pre-Attention mechanism. we can get stable lexicons by attention values, which is an inspiring method of information extraction.