论文标题
期刊级引文网络中无监督的异常检测
Unsupervised Anomaly Detection in Journal-Level Citation Networks
论文作者
论文摘要
期刊影响因素是确定学术界期刊质量的流行指标。期刊收到的引用数量是确定影响因子的关键因素,这可能会以多种方式滥用。因此,检测引用异常以进一步识别影响因子的操纵和通胀至关重要。引用网络以有向图的方式模拟期刊之间的引文关系。在引用网络中检测异常是一项具有挑战性的任务,它在发现引文卡特尔和引文堆栈以及理解引用背后的意图方面有多个应用程序。在本文中,我们提出了一种新的方法来检测期刊级科学引文网络中的异常,并将结果与现有图异常检测算法进行比较。由于缺乏适当的基础真相,我们引入了期刊级引文异常数据集,该数据集由合成注射的引文异常组成,并使用它来评估我们的方法论。我们的方法能够以100 \%的精度和86%的F1评分来预测异常引用对。我们将检测到的异常进一步分为各种类型,并理由解决可能的原因。我们还在Microsoft Academic搜索数据集上分析了我们的模型 - 现实世界引用数据集,并使用案例研究来解释我们的结果,其中我们的结果类似于引用和Scimago期刊等级(SJR)评级变化图表,从而表明了我们方法的有用性。我们进一步设计了“期刊引用分析工具”,这是一个交互式Web门户,鉴于引用网络作为输入,它显示了期刊级别的异常引用模式,并帮助用户分析多年来给定期刊的引用模式。
Journal Impact Factor is a popular metric for determining the quality of a journal in academia. The number of citations received by a journal is a crucial factor in determining the impact factor, which may be misused in multiple ways. Therefore, it is crucial to detect citation anomalies for further identifying manipulation and inflation of impact factor. Citation network models the citation relationship between journals in terms of a directed graph. Detecting anomalies in the citation network is a challenging task which has several applications in spotting citation cartels and citation stack and understanding the intentions behind the citations. In this paper, we present a novel approach to detect the anomalies in a journal-level scientific citation network, and compare the results with the existing graph anomaly detection algorithms. Due to the lack of proper ground-truth, we introduce a journal-level citation anomaly dataset which consists of synthetically injected citation anomalies and use it to evaluate our methodology. Our method is able to predict the anomalous citation pairs with a precision of 100\% and an F1-score of 86%. We further categorize the detected anomalies into various types and reason out possible causes. We also analyze our model on the Microsoft Academic Search dataset - a real-world citation dataset and interpret our results using a case study, wherein our results resemble the citations and SCImago Journal Rank (SJR) rating-change charts, thus indicating the usefulness of our method. We further design `Journal Citation Analysis Tool', an interactive web portal which, given the citation network as an input, shows the journal-level anomalous citation patterns and helps users analyze citation patterns of a given journal over the years.