论文标题

在Minangkabau语言中迈向计算语言学:情感分析和机器翻译的研究

Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation

论文作者

Koto, Fajri, Koto, Ikhwan

论文摘要

尽管一些语言学家(Rusmali等,1985; Crouch,2009)已相当试图定义Minangkabau的形态和语法,但由于带注释资源的稀缺性,仍缺乏此语言的信息处理。在这项工作中,我们发行了两个Minangkabau Corpora:情感分析和机器翻译,这些语言和机器翻译是从Twitter和Wikipedia收集和构建的。我们采用经典的机器学习和序列到序列模型(例如LSTM和Transformer)进行了Minangkabau语言的第一个计算语言学。我们的第一个实验表明,在用印尼语训练的模型测试时,Minangkabau文本的分类性能会显着下降。而在计算机翻译实验中,使用双语词典的简单单词对单词翻译优于LSTM和Transformer模型,而Transformer模型则在BLEU分数方面。

Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource. In this work, we release two Minangkabau corpora: sentiment analysis and machine translation that are harvested and constructed from Twitter and Wikipedia. We conduct the first computational linguistics in Minangkabau language employing classic machine learning and sequence-to-sequence models such as LSTM and Transformer. Our first experiments show that the classification performance over Minangkabau text significantly drops when tested with the model trained in Indonesian. Whereas, in the machine translation experiment, a simple word-to-word translation using a bilingual dictionary outperforms LSTM and Transformer model in terms of BLEU score.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源