论文标题

检测和表征提交代码的机器人

Detecting and Characterizing Bots that Commit Code

论文作者

Dey, Tapajit, Mousavi, Sara, Ponce, Eduardo, Fry, Tanner, Vasilescu, Bogdan, Filippova, Anna, Mockus, Audris

论文摘要

背景:一些开发人员的活动传统上手动执行,例如制定代码提交,开放,管理或结束问题在许多OSS项目中越来越多地自动化。具体而言,这种活动通常是由对事件做出反应或在特定时间运行的工具执行的。我们将这种自动化工具称为机器人,在许多软件采矿方案中,与开发人员生产力或代码质量有关,因此希望识别机器人以将其行为与个人的行为区分开来。目的:找到一种自动化的方式来识别这些机器人所做的机器人和代码,并根据其活动模式来表征机器人的类型。方法和结果:我们提出了Biman,一种系统的方法,用于使用作者名称,提交消息,提交文件修改的文件以及与Ommits关联的项目检测机器人。对于我们的测试数据,AUC-ROC的值为0.9。我们还根据其代码提交的时间模式和修改的文件类型来表征这些机器人,并发现它们主要与文档文件和网页一起工作,并且这些文件在HTML和JavaScript生态系统中最普遍。我们已经编制了一个可共享的数据集,其中包含有关我们发现的461个机器人(所有人都有1000多个提交)和13,762,430次提交的详细信息。

Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. We refer to such automation tools as bots and, in many software mining scenarios related to developer productivity or code quality it is desirable to identify bots in order to separate their actions from actions of individuals. Aim: Find an automated way of identifying bots and code committed by these bots, and to characterize the types of bots based on their activity patterns. Method and Result: We propose BIMAN, a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the ommits. For our test data, the value for AUC-ROC was 0.9. We also characterized these bots based on the time patterns of their code commits and the types of files modified, and found that they primarily work with documentation files and web pages, and these files are most prevalent in HTML and JavaScript ecosystems. We have compiled a shareable dataset containing detailed information about 461 bots we found (all of whom have more than 1000 commits) and 13,762,430 commits they created.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源