迈向代码开关分类利用组成语言资源

论文标题

迈向代码开关分类利用组成语言资源

Towards Code-switched Classification Exploiting Constituent Language Resources

论文作者

Dadu, Tanvi, Pant, Kartikey

论文摘要

代码切换是一种通常观察到的交流现象，表示在同一语音交换中从一种语言转移到另一种语言。由于数据的可用性有限，对代码开关数据的分析通常成为一项艰巨的任务。我们建议将代码切换的数据转换为其组成的高资源语言，以利用本工作中的单语和跨语性设置。这种转换使我们能够为多个下游任务利用其组成语言的较高资源可用性。我们在英语印度密码开关设置中执行了两个下游任务的实验，即讽刺检测和仇恨言语检测。这些实验显示，与最新的ART相比，讽刺检测和仇恨言论检测的F1得分分别增加了22％和42.5％。

Code-switching is a commonly observed communicative phenomenon denoting a shift from one language to another within the same speech exchange. The analysis of code-switched data often becomes an assiduous task, owing to the limited availability of data. We propose converting code-switched data into its constituent high resource languages for exploiting both monolingual and cross-lingual settings in this work. This conversion allows us to utilize the higher resource availability for its constituent languages for multiple downstream tasks. We perform experiments for two downstream tasks, sarcasm detection and hate speech detection, in the English-Hindi code-switched setting. These experiments show an increase in 22% and 42.5% in F1-score for sarcasm detection and hate speech detection, respectively, compared to the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题