排名增强的无监督句子表示

论文标题

排名增强的无监督句子表示

Ranking-Enhanced Unsupervised Sentence Representation Learning

论文作者

Seonwoo, Yeon, Wang, Guoyin, Seo, Changmin, Choudhary, Sajal, Li, Jiwei, Li, Xiang, Xu, Puyang, Park, Sunghyun, Oh, Alice

论文摘要

无监督的句子表示学习通过对比度学习和数据增强方法（例如辍学掩蔽）进行了进展。尽管取得了这种进步，但句子编码器仍仅限于预测其语义向量时仅使用输入句子。在这项工作中，我们表明，句子的语义含义也由与输入句子相似的最近的邻居句子确定。基于这一发现，我们提出了一个新颖的无监督句子编码器Rankencoder。 Rankencoder通过利用其与外部语料库中的其他句子以及输入句子本身的关系来预测输入句子的语义向量。我们在语义文本基准数据集上评估rankencoder。从实验结果中，我们验证1）rankencoder达到80.07％的Spearman相关性，与先前的最先进的性能相比，绝对提高1.1％，2）rankencoder普遍适用于现有的无访问句子嵌入方法，并且3）特定有效地预测了类似的句子句。

Unsupervised sentence representation learning has progressed through contrastive learning and data augmentation methods such as dropout masking. Despite this progress, sentence encoders are still limited to using only an input sentence when predicting its semantic vector. In this work, we show that the semantic meaning of a sentence is also determined by nearest-neighbor sentences that are similar to the input sentence. Based on this finding, we propose a novel unsupervised sentence encoder, RankEncoder. RankEncoder predicts the semantic vector of an input sentence by leveraging its relationship with other sentences in an external corpus, as well as the input sentence itself. We evaluate RankEncoder on semantic textual benchmark datasets. From the experimental results, we verify that 1) RankEncoder achieves 80.07% Spearman's correlation, a 1.1% absolute improvement compared to the previous state-of-the-art performance, 2) RankEncoder is universally applicable to existing unsupervised sentence embedding methods, and 3) RankEncoder is specifically effective for predicting the similarity scores of similar sentence pairs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题