用于鲁棒文本分类的sibylvariant转换

论文标题

用于鲁棒文本分类的sibylvariant转换

Sibylvariant Transformations for Robust Text Classification

论文作者

Harel-Canada, Fabrice, Gulzar, Muhammad Ali, Peng, Nanyun, Kim, Miryung

论文摘要

NLP中的绝大多数文本转换技术本质上受到其扩展输入空间覆盖范围的能力，这是由于隐含的约束以保留原始类标签。在这项工作中，我们提出了sibylvariance（SIB）的概念，以描述放宽具有标签的约束的更广泛的转换集，可以知道预期的类别，并导致更大的输入分布。我们提供一个统一的框架来组织所有数据转换，包括两种类型的SIB：（1）传输将一种离散类型转换为另一种SIB，（2）混合突变将两个或更多类混合在一起。为了探索sibylvariance在NLP中的作用，我们实施了41个文本转换，包括几种新型技术，例如Concept2Sentence和SendMix。 sibylvariance还可以实现一种独特的自适应训练形式，为最困惑的班级配对生成了新的输入混合物，挑战学习者以更大的细微差别。我们在六个基准数据集上的实验强烈支持SIBYLAINIANCE对泛化性能，缺陷检测和对抗性鲁棒性的功效。

The vast majority of text transformation techniques in NLP are inherently limited in their ability to expand input space coverage due to an implicit constraint to preserve the original class label. In this work, we propose the notion of sibylvariance (SIB) to describe the broader set of transforms that relax the label-preserving constraint, knowably vary the expected class, and lead to significantly more diverse input distributions. We offer a unified framework to organize all data transformations, including two types of SIB: (1) Transmutations convert one discrete kind into another, (2) Mixture Mutations blend two or more classes together. To explore the role of sibylvariance within NLP, we implemented 41 text transformations, including several novel techniques like Concept2Sentence and SentMix. Sibylvariance also enables a unique form of adaptive training that generates new input mixtures for the most confused class pairs, challenging the learner to differentiate with greater nuance. Our experiments on six benchmark datasets strongly support the efficacy of sibylvariance for generalization performance, defect detection, and adversarial robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题