预先训练的语言模型微调的对比度学习

论文标题

预先训练的语言模型微调的对比度学习

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

论文作者

Gunel, Beliz, Du, Jingfei, Conneau, Alexis, Stoyanov, Ves

论文摘要

最先进的自然语言理解分类模型遵循两个阶段：在辅助任务上预先培训大型语言模型，然后在特定于任务的标记数据集中对模型进行跨内部损失进行微调。但是，跨透明拷贝的损失有几个缺点，可能导致次级概括和不稳定性。在直觉的驱动下，良好的概括需要捕获一个班级中的示例之间的相似性，并将它们与其他班级中的示例进行对比，我们提出了一个监督的对比度学习（SCL）目标（SCL）目标。结合跨渗透性，我们提出的SCL损失在几乎没有射击的学习环境中在多个胶水基准的多个数据集上取得了重大改进，而无需专门的体系结构，数据增强，存储库或其他无处理数据。我们提出的微调目标导致模型在微调培训数据中对不同级别的噪声更强大，并且可以更好地推广使用有限的标记数据的相关任务。

State-of-the-art natural language understanding classification models follow two-stages: pre-training a large language model on an auxiliary task, and then fine-tuning the model on a task-specific labeled dataset using cross-entropy loss. However, the cross-entropy loss has several shortcomings that can lead to sub-optimal generalization and instability. Driven by the intuition that good generalization requires capturing the similarity between examples in one class and contrasting them with examples in other classes, we propose a supervised contrastive learning (SCL) objective for the fine-tuning stage. Combined with cross-entropy, our proposed SCL loss obtains significant improvements over a strong RoBERTa-Large baseline on multiple datasets of the GLUE benchmark in few-shot learning settings, without requiring specialized architecture, data augmentations, memory banks, or additional unsupervised data. Our proposed fine-tuning objective leads to models that are more robust to different levels of noise in the fine-tuning training data, and can generalize better to related tasks with limited labeled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题