论文标题

通过理由提取的零射击代码混合进攻跨度标识

Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction

论文作者

Ravikiran, Manikandan, Chakravarthi, Bharathi Raja

论文摘要

本文调查了句子级变压器对泰米尔泰米尔语数据集上零射击跨度标识的有效性。 More specifically, we evaluate rationale extraction methods of Local Interpretable Model Agnostic Explanations (LIME) \cite{DBLP:conf/kdd/Ribeiro0G16} and Integrated Gradients (IG) \cite{DBLP:conf/icml/SundararajanTY17} for adapting transformer based offensive language classification models for zero-shot offensive span identification.为此,我们发现石灰和IG显示基线$ f_ {1} $分别为26.35 \%和44.83 \%。此外,我们研究数据集大小和训练过程对跨度识别总体准确性的影响。结果,我们发现石灰和IG可以通过蒙版数据增强和多标签训练显示出显着改善,分别为50.23 \%和47.38 \%,$ f_ {1} $ f_ {1} $。 \ textit {免责声明:本文包含可能被视为亵渎,粗俗或冒犯性的示例。这些例子并不代表作者或其雇主/研究生院对任何人,组,实践或实体/实体的观点。相反,它们仅用于强调语言研究的挑战。}

This paper investigates the effectiveness of sentence-level transformers for zero-shot offensive span identification on a code-mixed Tamil dataset. More specifically, we evaluate rationale extraction methods of Local Interpretable Model Agnostic Explanations (LIME) \cite{DBLP:conf/kdd/Ribeiro0G16} and Integrated Gradients (IG) \cite{DBLP:conf/icml/SundararajanTY17} for adapting transformer based offensive language classification models for zero-shot offensive span identification. To this end, we find that LIME and IG show baseline $F_{1}$ of 26.35\% and 44.83\%, respectively. Besides, we study the effect of data set size and training process on the overall accuracy of span identification. As a result, we find both LIME and IG to show significant improvement with Masked Data Augmentation and Multilabel Training, with $F_{1}$ of 50.23\% and 47.38\% respectively. \textit{Disclaimer : This paper contains examples that may be considered profane, vulgar, or offensive. The examples do not represent the views of the authors or their employers/graduate schools towards any person(s), group(s), practice(s), or entity/entities. Instead they are used to emphasize only the linguistic research challenges.}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源