论文标题
DEV2VEC:代表开发人员在嵌入空间中的域专业知识
Dev2vec: Representing Domain Expertise of Developers in an Embedding Space
论文作者
论文摘要
对开发人员的领域专业知识的准确评估对于分配适当的候选人来为项目做出贡献或参加工作职位很重要。由于潜在的候选人可以来自大型游泳池,因此对该领域专业知识的自动评估是一个理想的目标。尽管以前的方法在一个单个软件项目中取得了成功,但对开发人员的领域专业知识的评估跨多个项目的贡献更具挑战性。在本文中,我们采用DOC2VEC代表开发人员的领域专业知识作为嵌入向量。这些向量来自包含开发人员专业知识的证据的不同来源,例如对他们贡献的存储库的描述,解决历史的问题以及API在其提交中的要求。我们将其命名为Dev2Vec,并证明了其在代表开发人员的技术专业化方面的有效性。我们的结果表明,编码开发人员在嵌入矢量中的专业知识优于最先进的方法,并提高了F1得分高达21%。此外,我们的发现表明,开发人员的``问题解决历史''是代表开发人员在嵌入空间中的领域专业知识的最有用的信息来源。
Accurate assessment of the domain expertise of developers is important for assigning the proper candidate to contribute to a project or to attend a job role. Since the potential candidate can come from a large pool, the automated assessment of this domain expertise is a desirable goal. While previous methods have had some success within a single software project, the assessment of a developer's domain expertise from contributions across multiple projects is more challenging. In this paper, we employ doc2vec to represent the domain expertise of developers as embedding vectors. These vectors are derived from different sources that contain evidence of developers' expertise, such as the description of repositories that they contributed, their issue resolving history, and API calls in their commits. We name it dev2vec and demonstrate its effectiveness in representing the technical specialization of developers. Our results indicate that encoding the expertise of developers in an embedding vector outperforms state-of-the-art methods and improves the F1-score up to 21%. Moreover, our findings suggest that ``issue resolving history'' of developers is the most informative source of information to represent the domain expertise of developers in embedding spaces.