jupyter笔记本中的代码复制和重复使用

论文标题

jupyter笔记本中的代码复制和重复使用

Code Duplication and Reuse in Jupyter Notebooks

论文作者

Koenzen, Andreas, Ernst, Neil, Storey, Margaret-Anne

论文摘要

复制自己的代码使编写软件更快。这种权宜之计对于计算笔记本用户特别有价值。重复允许笔记本用户快速检验假设并迭代数据。在本文中，我们探讨了计算笔记本中代码重复的多少，如何以及从何处，并确定代码重用的潜在障碍。计算笔记本领域的先前工作描述了开发人员重复使用和重复的动机，但没有显示重复使用的发生量或重复使用代码时遇到的障碍。为了解决这一差距，我们首先分析了GitHub存储库中存储库的Jupyter笔记本中包含的代码重复项，然后对代码重复使用的观察用户研究进行了研究，参与者使用笔记本解决了特定的任务。我们的发现表明，样本中的存储库的平均自我谋取率为7.6％。但是，在我们的用户研究中，很少有参与者复制自己的代码，而是宁愿从在线来源重复使用代码。

Duplicating one's own code makes it faster to write software. This expediency is particularly valuable for users of computational notebooks. Duplication allows notebook users to quickly test hypotheses and iterate over data. In this paper, we explore how much, how and from where code duplication occurs in computational notebooks, and identify potential barriers to code reuse. Previous work in the area of computational notebooks describes developers' motivations for reuse and duplication but does not show how much reuse occurs or which barriers they face when reusing code. To address this gap, we first analyzed GitHub repositories for code duplicates contained in a repository's Jupyter notebooks, and then conducted an observational user study of code reuse, where participants solved specific tasks using notebooks. Our findings reveal that repositories in our sample have a mean self-duplication rate of 7.6%. However, in our user study, few participants duplicated their own code, preferring to reuse code from online sources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题