论文标题
代码代码演变:了解人们如何随着时间的推移更改数据科学笔记本
Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time
论文作者
论文摘要
感官制作是从数据中识别,提取和解释见解的迭代过程,其中每次迭代都称为“感觉循环”。尽管最近的工作观察到计算笔记本中的感官循环的快照,但在探索和解释之间,没有衡量感觉行为的变化。这个差距限制了我们了解感官过程范围的全部范围,从而限制了我们设计工具以完全支持感官制作的能力。我们贡献了第一个定量方法来表征数据科学计算笔记本中的感觉如何发展。为此,我们对从Github开采的2,574个Jupyter笔记本进行了定量研究。首先,我们确定以数据科学为重点的笔记本电脑进行了重大迭代。其次,我们提出回归模型,该模型通过为单个笔记本电脑中的分配分数代表其在感官频谱中的位置来自动表征各个笔记本中的感官活动。最后,我们使用回归模型来计算和分析跨GITHUB版本的笔记本分数的变化。我们的结果表明,随着时间的流逝,笔记本作者参与了各种各样的感官任务,例如注释,分支分析和文档。最后,我们提出了扩展笔记本环境的设计建议,以支持我们观察到的感官行为。
Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time -- between exploration and explanation. This gap limits our ability to understand the full scope of the sensemaking process and thus our ability to design tools to fully support sensemaking. We contribute the first quantitative method to characterize how sensemaking evolves within data science computational notebooks. To this end, we conducted a quantitative study of 2,574 Jupyter notebooks mined from GitHub. First, we identify data science-focused notebooks that have undergone significant iterations. Second, we present regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking spectrum. Finally, we use our regression models to calculate and analyze shifts in notebook scores across GitHub versions. Our results show that notebook authors participate in a diverse range of sensemaking tasks over time, such as annotation, branching analysis, and documentation. Finally, we propose design recommendations for extending notebook environments to support the sensemaking behaviors we observed.