排名课程：在自然语言处理中排名的尚未开发的潜力

论文标题

排名课程：在自然语言处理中排名的尚未开发的潜力

Rank over Class: The Untapped Potential of Ranking in Natural Language Processing

论文作者

Atapour-Abarghouei, Amir, Bonner, Stephen, McGough, Andrew Stephen

论文摘要

长期以来，文本分类一直是自然语言处理（NLP）中的主食，其应用程序跨越了情感分析，推荐系统和垃圾邮件检测等不同领域。有了如此强大的解决方案，通常很容易将其用作所有NLP问题的首选工具，因为当您握住锤子时，一切看起来都像指甲。但是，我们在这里认为，当前使用分类解决的许多任务实际上已被划分为分类模具，如果我们将它们作为排名问题，我们不仅可以改善模型，而且还取得了更好的性能。我们提出了一种新颖的端到端排名方法，该方法由负责生成两对文本序列表示表示的变压器网络组成，而该序列又转化为上下文汇总网络输出排名分数，用于确定基于某些相关概念的序列排序。我们在公开可用的数据集上执行了许多实验，并调查了经常使用分类解决的问题中排名的应用程序。在对重组情绪分析数据集的实验中，将排名结果转换为分类标签的实验比最先进的文本分类提高了约22％，这证明了在某些情况下，文本排名对文本分类的功效。

Text classification has long been a staple within Natural Language Processing (NLP) with applications spanning across diverse areas such as sentiment analysis, recommender systems and spam detection. With such a powerful solution, it is often tempting to use it as the go-to tool for all NLP problems since when you are holding a hammer, everything looks like a nail. However, we argue here that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould and that if we instead address them as a ranking problem, we not only improve the model, but we achieve better performance. We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences, which are in turn passed into a context aggregating network outputting ranking scores used to determine an ordering to the sequences based on some notion of relevance. We perform numerous experiments on publicly-available datasets and investigate the applications of ranking in problems often solved using classification. In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification, demonstrating the efficacy of text ranking over text classification in certain scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题