Trans-BlstM：具有双向LSTM的变压器用于语言理解

论文标题

Trans-BlstM：具有双向LSTM的变压器用于语言理解

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

论文作者

Huang, Zhiheng, Xu, Peng, Liang, Davis, Mishra, Ajay, Xiang, Bing

论文摘要

来自变形金刚（BERT）的双向编码器表示，最近在包括句子分类，机器翻译和问题回答的广泛的NLP任务上实现了最先进的性能。 BERT模型体系结构主要来自变压器。在变压器时代之前，双向长期短期记忆（BLSTM）一直是神经机器翻译和问题答案的主要建模架构。在本文中，我们研究了如何将这两种建模技术组合在一起以创建更强大的模型体系结构。我们提出了一个新的体系结构，该结构用BLSTM（Trans-BLSTM）表示为变压器，该架构的BLSTM层集成到每个变压器块上，从而导致了变压器和BLSTM的关节建模框架。我们表明，与胶水和小队1.1实验中的BERT基准相比，Trans-BLSTM模型始终导致准确性提高。我们的Trans-BLSTM模型在Squad 1.1开发数据集上获得了94.01％的F1分数，这与最新结果相当。

Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model architecture is derived primarily from the transformer. Prior to the transformer era, bidirectional Long Short-Term Memory (BLSTM) has been the dominant modeling architecture for neural machine translation and question answering. In this paper, we investigate how these two modeling techniques can be combined to create a more powerful model architecture. We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block, leading to a joint modeling framework for transformer and BLSTM. We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT baselines in GLUE and SQuAD 1.1 experiments. Our TRANS-BLSTM model obtains an F1 score of 94.01% on the SQuAD 1.1 development dataset, which is comparable to the state-of-the-art result.

下载PDF全文

下载文献需遵守相关版权规定

论文标题