论文标题
upb在Semeval-2020任务9:使用变压器和多任务学习中的代码混合社交媒体文本中的情感
UPB at SemEval-2020 Task 9: Identifying Sentiment in Code-Mixed Social Media Texts using Transformers and Multi-Task Learning
论文作者
论文摘要
情感分析是当今开展的意见采矿活动中广泛使用的过程。这种现象在各个领域中介绍了应用程序,尤其是在收集与用户有关特定主题的态度或满意度有关的信息时。但是,当在倾向于结合两种语言以表达思想和思想的文化中应用它时,管理这种过程的任务变得更加困难。通过将两种语言的单词交织在一起,用户可以轻松表达,但要使文本对于那些不熟悉此技术而是标准意见挖掘算法的人来说,文本也差得多。在本文中,我们描述了我们团队为Semeval-2020任务9开发的系统,旨在涵盖两种著名的代码混合语言:印地语英语和西班牙语英语。 我们打算通过引入一种利用几种神经网络方法以及预训练的单词嵌入的解决方案来解决此问题。我们的方法(多语言BERT)在印度英语任务上取得了有希望的表现,平均F1得分为0.6850,在比赛排行榜上注册,在62名参与者中排名第16位。对于西班牙语 - 英语任务,我们通过使用另一个基于多语言变压器的模型XLM-Roberta,获得了29名参与者中第17位的F1得分为0.7064。
Sentiment analysis is a process widely used in opinion mining campaigns conducted today. This phenomenon presents applications in a variety of fields, especially in collecting information related to the attitude or satisfaction of users concerning a particular subject. However, the task of managing such a process becomes noticeably more difficult when it is applied in cultures that tend to combine two languages in order to express ideas and thoughts. By interleaving words from two languages, the user can express with ease, but at the cost of making the text far less intelligible for those who are not familiar with this technique, but also for standard opinion mining algorithms. In this paper, we describe the systems developed by our team for SemEval-2020 Task 9 that aims to cover two well-known code-mixed languages: Hindi-English and Spanish-English. We intend to solve this issue by introducing a solution that takes advantage of several neural network approaches, as well as pre-trained word embeddings. Our approach (multlingual BERT) achieves promising performance on the Hindi-English task, with an average F1-score of 0.6850, registered on the competition leaderboard, ranking our team 16th out of 62 participants. For the Spanish-English task, we obtained an average F1-score of 0.7064 ranking our team 17th out of 29 participants by using another multilingual Transformer-based model, XLM-RoBERTa.