Burt：双胞胎结构的Bert启发的通用表示

论文标题

Burt：双胞胎结构的Bert启发的通用表示

BURT: BERT-inspired Universal Representation from Twin Structure

论文作者

Li, Yian, Zhao, Hai

论文摘要

预先训练的上下文化语言模型（例如BERT）在广泛的下游自然语言处理（NLP）任务中表现出很大的有效性。但是，模型在一个序列内的每个令牌而不是每个序列中提供的有效表示形式，而微调步骤涉及两个序列的输入，从而导致不同粒度的各种序列的不满意表示。尤其是，作为这些模型中句子级的表示为完整的培训环境，在低级语言单位（短语和单词）上表现不佳。在这项工作中，我们介绍了Burt（Bert启发了Twin结构的通用表示），该伯特能够使用大量的自然语言推断和用多个培训对象生成任何粒度的输入序列，即单词，短语和句子的输入序列，即单词，短语和句子。我们提议的Burt采用了暹罗网络，分别从自然语言推理数据集中学习句子级表示，并分别从释义数据集中学习句子级表示。我们评估了文本相似性任务的不同粒度的Burt，包括STS任务，SEMEVAL2013任务5（a）和一些常用的单词相似性任务，其中Burt在句子级别的数据集中大大优于其他表示模型，并在Word/phlase-phervel level表示中取得了重大改进。

Pre-trained contextualized language models such as BERT have shown great effectiveness in a wide range of downstream Natural Language Processing (NLP) tasks. However, the effective representations offered by the models target at each token inside a sequence rather than each sequence and the fine-tuning step involves the input of both sequences at one time, leading to unsatisfying representations of various sequences with different granularities. Especially, as sentence-level representations taken as the full training context in these models, there comes inferior performance on lower-level linguistic units (phrases and words). In this work, we present BURT (BERT inspired Universal Representation from Twin Structure) that is capable of generating universal, fixed-size representations for input sequences of any granularity, i.e., words, phrases, and sentences, using a large scale of natural language inference and paraphrase data with multiple training objectives. Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset, respectively. We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks, where BURT substantially outperforms other representation models on sentence-level datasets and achieves significant improvements in word/phrase-level representation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题