论文标题

分析俄罗斯维克蒂安的引号语料库

Analysis of the quotation corpus of the Russian Wiktionary

论文作者

Smirnov, A., Levashova, T., Karpov, A., Kipyatkova, I., Ronzhin, A., Krizhanovsky, A., Krizhanovsky, N.

论文摘要

使用已发达的Wiktionary解析器进行了俄罗斯维克蒂亚纳利亚报价的定量评估。发现字典中的报价数量正在迅速增长(2011年为51.5万,2012年有6.2万次)。这些引文被提取并保存在机器可读词典的关系数据库中。对于此数据库,设计了与报价有关的表。建立了不同年份的文学作品引文分布的直方图。它是通过将直方图与19世纪最受欢迎和最受欢迎的(在俄罗斯维克蒂亚式)作家(在俄罗斯的维克蒂安(Wiktionary)中)相关联的多年来解释直方图的特征的。据发现,俄罗斯维克蒂安(Wiktionary)中包含的所有报价(示例句子)中有超过三分之一是由俄罗斯国家语料库的Wiktionary入境的编辑所吸引的。

The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). These quotations were extracted and saved in the relational database of a machine-readable dictionary. For this database, tables related to the quotations were designed. A histogram of distribution of quotations of literary works written in different years was built. It was made an attempt to explain the characteristics of the histogram by associating it with the years of the most popular and cited (in the Russian Wiktionary) writers of the nineteenth century. It was found that more than one-third of all the quotations (the example sentences) contained in the Russian Wiktionary are taken by the editors of a Wiktionary entry from the Russian National Corpus.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源