论文标题

用于软件项目的语义增强主题推荐系统

Semantically-enhanced Topic Recommendation System for Software Projects

论文作者

Izadi, Maliheh, Nejati, Mahtab, Heydarnoori, Abbas

论文摘要

与软件相关的平台使他们的用户能够与主题进行协作标签软件实体。可以利用具有相关主题的标记软件存储库来促进各种下游任务。例如,分配给存储库的正确而完整的主题集可以提高其可见性。因此,这改善了浏览,搜索,导航和存储库等任务的结果。不幸的是,分配的主题通常是高度嘈杂的,并且有些存储库没有良好的主题。因此,已经为软件项目推荐主题的努力,但是,到目前为止,这些主题之间的语义关系尚未得到利用。 我们提出了两个推荐模型,用于标记软件项目,以结合主题之间的语义关系。我们的方法有两个主要阶段。 (1)我们首先采用协作方法来策划专门针对软件工程和开发领域的质量主题数据集。我们还通过这些主题之间的语义关系丰富了这些数据,并将它们封装在我们称为sed-kgraph的知识图中。然后,(2)我们构建了两个推荐系统;第一个仅基于分配给存储库的原始主题列表以及我们知识图中指定的关系。但是,第二个预测模型假设没有可用于存储库的主题,因此它可以根据软件项目和SED-Kgraph的文本信息进行预测相关主题。 我们在一个众包项目中建造了Sed-kgraph,并拥有来自学术界和行业的170名贡献者。实验结果表明,在ASR和MAP指标方面,我们的解决方案优于忽略主题之间的语义关系至少25%和23%。

Software-related platforms have enabled their users to collaboratively label software entities with topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. For instance, a correct and complete set of topics assigned to a repository can increase its visibility. Consequently, this improves the outcome of tasks such as browsing, searching, navigation, and organization of repositories. Unfortunately, assigned topics are usually highly noisy, and some repositories do not have well-assigned topics. Thus, there have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. We propose two recommender models for tagging software projects that incorporate the semantic relationship among topics. Our approach has two main phases; (1) we first take a collaborative approach to curate a dataset of quality topics specifically for the domain of software engineering and development. We also enrich this data with the semantic relationships among these topics and encapsulate them in a knowledge graph we call SED-KGraph. Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph. The second predictive model, however, assumes there are no topics available for a repository, hence it proceeds to predict the relevant topics based on both textual information of a software project and SED-KGraph. We built SED-KGraph in a crowd-sourced project with 170 contributors from both academia and industry. The experiment results indicate that our solutions outperform baselines that neglect the semantic relationships among topics by at least 25% and 23% in terms of ASR and MAP metrics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源