舒伯特：学术文献块，用伯特编码的提升引用计数预测

论文标题

舒伯特：学术文献块，用伯特编码的提升引用计数预测

SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction

论文作者

van Dongen, Thomas, Wenniger, Gideon Maillette de Buy, Schomaker, Lambert

论文摘要

预测学术文档的引用数量是学术文档处理中的即将到来的任务。除了这些信息的内在优点外，它还可以用作质量不完美的代理，其优势是可以廉价用于大量学术文档。以前的工作已经处理了数量的引用预测，其中培训数据集或较大的数据集，但输入简短的文本。在这项工作中，我们利用开放访问ACL选集的收集与语义学者书目学数据库结合使用，以创建具有相关引文信息的大量学术文档语料库，我们提出了一种名为Schubert的新引文预测模型。在我们的实验中，我们将舒伯特与几个最新的引文预测模型进行了比较，并表明它的表现优于先前的方法。我们还表明了使用更多训练数据和更长的输入以进行引用预测的优点。

Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a large margin. We also show the merit of using more training data and longer input for number of citations prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题