论文标题
“我更新了<ref>':英语wikipedia中参考的演变及其对altmetrics的影响
'I Updated the <ref>': The Evolution of References in the English Wikipedia and the Implications for Altmetrics
论文作者
论文摘要
通过这项工作,我们介绍了英文Wikipedia中所有参考文献(超过5500万)的历史记录的公开数据集,直到2019年6月。我们已经应用了一种新方法来识别和监视Wikipedia中的参考文献,因此,对于每个参考,我们可以提供有关相关动作的数据:创建,修改,修改,修改,修改,删除和重新插入。通过全面的Crowdworker标签活动确认了该方法的高精度和结果数据集。我们使用数据集研究Wikipedia参考文献的时间演变以及用户的编辑行为。我们发现证据表明,主要持续努力以提高参考的质量:(1)参考和文档标识符(DOI,PubMedid,PMC,ISBN,ISBN,ISSN,ARXIV ID)的参考和文档标识符持续增加,并且(2)大多数参考策展工作是由注册人(不是Bots或Anmonymous Edorors)完成的。我们得出的结论是,Wikipedia参考文献的演变,包括倾向于在Altmetrics的相关性索引的设计中应利用倾向于它们的社区过程的动态,而我们的数据集可能是如此努力。
With this work, we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. The high accuracy of this method and the resulting dataset was confirmed via a comprehensive crowdworker labelling campaign. We use the dataset to study the temporal evolution of Wikipedia references as well as users' editing behaviour. We find evidence of a mostly productive and continuous effort to improve the quality of references: (1) there is a persistent increase of reference and document identifiers (DOI, PubMedID, PMC, ISBN, ISSN, ArXiv ID), and (2) most of the reference curation work is done by registered humans (not bots or anonymous editors). We conclude that the evolution of Wikipedia references, including the dynamics of the community processes that tend to them should be leveraged in the design of relevance indexes for altmetrics, and our dataset can be pivotal for such effort.