用于文档分类的级联语义和位置自我注意力网络

论文标题

用于文档分类的级联语义和位置自我注意力网络

Cascaded Semantic and Positional Self-Attention Network for Document Classification

论文作者

Jiang, Juyong, Zhang, Jie, Zhang, Kai

论文摘要

变形金刚在学习语言建模方面表现出了巨大的成功。但是，仍然存在一个开放的挑战，即如何使用位置（或时间）信息（单词订单）系统地汇总语义信息（单词嵌入）。在这项工作中，我们提出了一种新的体系结构，以在文档分类的背景下使用级联的语义和位置自我发挥网络（CSPAN）来汇总两个信息来源。 CSPAN使用与BI-LSTM级联的语义自我发场层以顺序处理语义和位置信息，然后通过残基连接将它们自适应地将它们组合在一起。与常用的位置编码方案相比，CSPAN可以以更容易解释和适应性的方式利用语义和单词位置之间的相互作用，并且可以显着改善分类性能，同时保留紧凑的模型大小和高收敛率。我们在几个基准数据集上评估了CSPAN模型，并通过仔细的消融研究评估了文档分类，并证明了与最新技术相比的令人鼓舞的结果。

Transformers have shown great success in learning representations for language modelling. However, an open challenge still remains on how to systematically aggregate semantic information (word embedding) with positional (or temporal) information (word orders). In this work, we propose a new architecture to aggregate the two sources of information using cascaded semantic and positional self-attention network (CSPAN) in the context of document classification. The CSPAN uses a semantic self-attention layer cascaded with Bi-LSTM to process the semantic and positional information in a sequential manner, and then adaptively combine them together through a residue connection. Compared with commonly used positional encoding schemes, CSPAN can exploit the interaction between semantics and word positions in a more interpretable and adaptive manner, and the classification performance can be notably improved while simultaneously preserving a compact model size and high convergence rate. We evaluate the CSPAN model on several benchmark data sets for document classification with careful ablation studies, and demonstrate the encouraging results compared with state of the art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题