元数据引起的对比度学习，用于零击多标签文本分类

论文标题

元数据引起的对比度学习，用于零击多标签文本分类

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

论文作者

Zhang, Yu, Shen, Zhihong, Wu, Chieh-Han, Xie, Boya, Hao, Junheng, Wang, Ye-Yi, Wang, Kuansan, Han, Jiawei

论文摘要

大型多标签文本分类（LMTC）旨在将文档与大型候选人集的相关标签相关联。大多数现有的LMTC方法都依赖于大量的人类注销培训数据，这些数据通常是昂贵的，并且遭受了长尾标签的分布（即，在培训集中仅发生了几次标签）。在本文中，我们在零弹位设置下研究LMTC，该设置不需要带有标签的带注释的文档，仅依赖于标签表面名称和描述。为了训练分类器计算文档和标签之间的相似性评分，我们提出了一种新型的元数据诱导的对比度学习（MICOL）方法。与以前的基于文本的对比学习技术不同，Micol利用文档元数据（例如，作者，场地和研究论文的参考文献）在网络上广泛使用，以得出类似的文档文件对。两个大规模数据集的实验结果表明：（1）Micol的表现明显优于强零文本分类和对比度学习基准；（2）Micol与在10K-200K标记的文档进行培训的最先进的监督元数据感知的LMTC方法；（3）与监督的方法相比，米其型倾向于预测更少的标签，从而减轻长尾标签上的恶化性能。

Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set. Most existing LMTC approaches rely on massive human-annotated training data, which are often costly to obtain and suffer from a long-tailed label distribution (i.e., many labels occur only a few times in the training set). In this paper, we study LMTC under the zero-shot setting, which does not require any annotated documents with labels and only relies on label surface names and descriptions. To train a classifier that calculates the similarity score between a document and a label, we propose a novel metadata-induced contrastive learning (MICoL) method. Different from previous text-based contrastive learning techniques, MICoL exploits document metadata (e.g., authors, venues, and references of research papers), which are widely available on the Web, to derive similar document-document pairs. Experimental results on two large-scale datasets show that: (1) MICoL significantly outperforms strong zero-shot text classification and contrastive learning baselines; (2) MICoL is on par with the state-of-the-art supervised metadata-aware LMTC method trained on 10K-200K labeled documents; and (3) MICoL tends to predict more infrequent labels than supervised methods, thus alleviates the deteriorated performance on long-tailed labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题