论文标题
Semeval-2020任务9:代码混合推文的情感分析概述
SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
论文作者
论文摘要
在本文中,我们介绍了Semeval-2020任务9的结果,内容是对代码混合推文的情感分析(Sentimix 2020)。我们还发行并描述了我们的Hinglish(印度英语)和Spanglish(西班牙语)语料库,并用文字级别的语言识别和句子级别的情感标签注释。这些语料库分别由20K和19K示例组成。情感标签是正面的,负,负和中性的。 Sentimix吸引了89份提交的意见,其中包括参加Hinglish比赛的61支球队,并参加了28个参加Spanglish竞赛的系统。获得的最佳性能是Hinglish的F1得分为75.0%,Spanglish的F1得分为80.6%。我们观察到类似Bert的模型和集合方法是参与者中最常见和成功的方法。
In this paper, we present the results of the SemEval-2020 Task 9 on Sentiment Analysis of Code-Mixed Tweets (SentiMix 2020). We also release and describe our Hinglish (Hindi-English) and Spanglish (Spanish-English) corpora annotated with word-level language identification and sentence-level sentiment labels. These corpora are comprised of 20K and 19K examples, respectively. The sentiment labels are - Positive, Negative, and Neutral. SentiMix attracted 89 submissions in total including 61 teams that participated in the Hinglish contest and 28 submitted systems to the Spanglish competition. The best performance achieved was 75.0% F1 score for Hinglish and 80.6% F1 for Spanglish. We observe that BERT-like models and ensemble methods are the most common and successful approaches among the participants.