论文标题

Paper2Repo:GitHub存储库的学术论文建议

paper2repo: GitHub Repository Recommendation for Academic Papers

论文作者

Shao, Huajie, Sun, Dachun, Wu, Jiahao, Zhang, Zecheng, Zhang, Aston, Yao, Shuochao, Liu, Shengzhong, Wang, Tianshi, Zhang, Chao, Abdelzaher, Tarek

论文摘要

Github已成为一个受欢迎的社会应用程序平台,大量用户发布其开源项目。特别是,越来越多的研究人员发布了与其研究论文相关的源代码的存储库,以吸引更多的人跟随他们的工作。在这一趋势的激励下,我们描述了一个新颖的项目 - 项目跨平台推荐系统,$ \ textit {Paper2Repo} $,该系统建议在GitHub上使用相关的存储库,该存储库与Microsoft Academic等学术搜索系统中的给定论文相匹配。关键挑战是确定两个平台上的输入纸及其相关存储库之间的相似性,即$ \ textit {而无需人类标签} $。为此,Paper2Repo集成了文本编码和受约束的图形卷积网络(GCN),以自动学习和映射纸张和存储库的嵌入到同一空间中,其中接近为建议提供了基础。为了使我们的方法在现实生活系统中更实用,用于模型培训的标签是根据GitHub上用户操作的功能自动计算的。在机器学习中,这种自动标记通常称为{\ em Distant Survice \/}。据作者所知,这是第一个遥远的跨平台(纸至存储库)匹配系统。我们评估了Paper2Repo在从Github和Microsoft Academic收集的现实世界数据集上的性能。结果表明,它的表现优于其他最先进的建议方法。

GitHub has become a popular social application platform, where a large number of users post their open source projects. In particular, an increasing number of researchers release repositories of source code related to their research papers in order to attract more people to follow their work. Motivated by this trend, we describe a novel item-item cross-platform recommender system, $\textit{paper2repo}$, that recommends relevant repositories on GitHub that match a given paper in an academic search system such as Microsoft Academic. The key challenge is to identify the similarity between an input paper and its related repositories across the two platforms, $\textit{without the benefit of human labeling}$. Towards that end, paper2repo integrates text encoding and constrained graph convolutional networks (GCN) to automatically learn and map the embeddings of papers and repositories into the same space, where proximity offers the basis for recommendation. To make our method more practical in real life systems, labels used for model training are computed automatically from features of user actions on GitHub. In machine learning, such automatic labeling is often called {\em distant supervision\/}. To the authors' knowledge, this is the first distant-supervised cross-platform (paper to repository) matching system. We evaluate the performance of paper2repo on real-world data sets collected from GitHub and Microsoft Academic. Results demonstrate that it outperforms other state of the art recommendation methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源