论文标题
对GitHub上Java项目中潜在的代码借贷和违反许可证的研究的研究
A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub
论文作者
论文摘要
凭借越来越多的开源软件,促进代码重用的GitHub等服务的普及以及对开源软件许可的常见误解,代码中违反许可证的问题越来越突出。在这项研究中,我们从Github编辑了广泛的流行Java项目,搜索代码克隆,并对代码片段级别上可能的代码借贷和违反许可证的原始分析。我们之所以选择Java作为一种语言,是因为它在行业中的流行,在这种情况下,窃问题由于可能采取法律行动而特别相关。我们分析和讨论在文件和项目中发现和手动评估的94个不同的许可,文件许可的差异,在许可之间借贷的分布分配,在许可之间借贷,各种可能的违反许可证的违反许可,大多数违反许可等。在特定代码中研究可能违反许可的许可证,我们发现了29.6%的潜在代码可能涉及潜在的代码和9.4%的违反,并违反了9.4%的违规行为。
With an ever-increasing amount of open source software, the popularity of services like GitHub that facilitate code reuse, and common misconceptions about the licensing of open source software, the problem of license violations in the code is getting more and more prominent. In this study, we compile an extensive corpus of popular Java projects from GitHub, search it for code clones and perform an original analysis of possible code borrowing and license violations on the level of code fragments. We chose Java as a language because of its popularity in industry, where the plagiarism problem is especially relevant because of possible legal action. We analyze and discuss distribution of 94 different discovered and manually evaluated licenses in files and projects, differences in the licensing of files, distribution of potential code borrowing between licenses, various types of possible license violations, most violated licenses, etc. Studying possible license violations in specific blocks of code, we have discovered that 29.6% of them might be involved in potential code borrowing and 9.4% of them could potentially violate original licenses.