论文标题

电信:一种众包制造非正式至正式文本语料库的方法

TeleCrowd: A Crowdsourcing Approach to Create Informal to Formal Text Corpora

论文作者

Masoumi, Vahid, Salehi, Mostafa, Veisi, Hadi, Haddadian, Golnoush, Ranjbar, Vahid, Sahebdel, Mahsa

论文摘要

最近,众包被广泛用作传统注释的替代方法,这些注释是昂贵且通常由专家完成的。但是,众包任务本身并不有趣,因此,将任务与游戏结合起来将增加参与者的动力和参与度。在本文中,我们提出了一个基于Telecrgram Messenger的Gamized众包平台,称为Telecrowd,将其社会力量用作基础平台和促进者来完成众包项目。此外,为了评估拟议平台的性能,我们运行了一个实验性众包项目,该项目由500个非正式的波斯句子组成,其中参与者应该为候选人提供与句子的正式等同的候选人,或者通过投票或退票来符合其他候选人的资格。在这项研究中,参与者提交了2700名候选人和21000票,并使用具有最高点的候选人,他们的高额投票和下降票数的候选人提交了平行数据集,因为建立了最好的候选人。当评估时,在收集到的数据集上达到了0.54的BLEU得分,这表明我们提出的平台可用于创建大型语料库。同样,与其他相关工作相比,该平台在时间段和成本价格方面效率很高,因为该项目的整个持续时间为28天,成本为40美元。

Crowdsourcing has been widely used recently as an alternative to traditional annotations that is costly and usually done by experts. However, crowdsourcing tasks are not interesting by themselves, therefore, combining tasks with game will increase both participants motivation and engagement. In this paper, we have proposed a gamified crowdsourcing platform called TeleCrowd based on Telegram Messenger to use its social power as a base platform and facilitator for accomplishing crowdsourcing projects. Furthermore, to evaluate the performance of the proposed platform, we ran an experimental crowdsourcing project consisting of 500 informal Persian sentences in which participants were supposed to provide candidates that were the formal equivalent of sentences or qualify other candidates by upvoting or downvoting them. In this study, 2700 candidates and 21000 votes were submitted by the participants and a parallel dataset using candidates with the highest points, sum of their upvotes and downvotes, as the best candidates was built. As the evaluation, BLEU score of 0.54 was achieved on the collected dataset which shows that our proposed platform can be used to create large corpora. Also, this platform is highly efficient in terms of time period and cost price in comparison with other related works, because the whole duration of the project was 28 days at a cost of 40 dollars.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源