论文标题

通过提示和标签检索零拍的文本分类的授权句子编码器

Empowering Sentence Encoders with Prompting and Label Retrieval for Zero-shot Text Classification

论文作者

Hong, Jimin, Park, Jungsoo, Kim, Daeyoung, Choi, Seongjae, Son, Bokyung, Kang, Jaewook

论文摘要

在对比预训练的情况下,通常优化句子编码器以在其嵌入空间中彼此靠近的语义相似样本。在这项工作中,我们专注于它们嵌入空间的潜力,可以很容易地适应零照片的文本分类,因为语义上不同的样本已经得到很好的分离。我们的框架RALP(检索增强标签的提示句子编码器),编码带有句子编码的标签候选者,然后分配标签,其提示嵌入与输入文本嵌入的标签具有最高的相似性。为了补偿其原始格式的潜在描述性标签,Ralp检索了与外部Corpora的原始标签提示在语义上相似的句子,并将其用作附加的伪标签提示。 RALP比在零弹位设置下的各种封闭式分类和多项选择质量质量质量标准数据集中实现竞争性或更强的性能。我们表明,检索组件在RALP的成功中起着关键作用,并且无论口头上的变化如何,其结果都得到了强劲的实现。

With contrastive pre-training, sentence encoders are generally optimized to locate semantically similar samples closer to each other in their embedding spaces. In this work, we focus on the potential of their embedding spaces to be readily adapted to zero-shot text classification, as semantically distinct samples are already well-separated. Our framework, RaLP (Retrieval augmented Label Prompts for sentence encoder), encodes prompted label candidates with a sentence encoder, then assigns the label whose prompt embedding has the highest similarity with the input text embedding. In order to compensate for the potentially poorly descriptive labels in their original format, RaLP retrieves sentences that are semantically similar to the original label prompt from external corpora and use them as additional pseudo-label prompts. RaLP achieves competitive or stronger performance than much larger baselines on various closed-set classification and multiple-choice QA datasets under zero-shot settings. We show that the retrieval component plays a pivotal role in RaLP's success, and its results are robustly attained regardless of verbalizer variations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源