论文标题
自动变量重命名:我们到了吗?
Automated Variable Renaming: Are We There Yet?
论文作者
论文摘要
标识符(例如方法和可变名称)形成了源代码的大部分。因此,低质量标识符可能会大大阻碍代码理解。为了支持开发人员使用有意义的标识符,已经提出了几种(半)自动技术,主要是数据驱动的(例如统计语言模型,深度学习模型)或依靠静态代码分析。尽管如此,对于推荐具有有意义识别符的开发人员的有效性,仍在进行有限的实证研究,可能导致重命名重构操作。我们提出了一项大规模研究,研究了数据驱动方法支持自动变量重命名的潜力。我们尝试三种最先进的技术:一种统计语言模型和两个基于DL的模型。这三种方法已经在我们构建的三个数据集上进行了培训和测试,目的是评估其推荐有意义的可变标识符的能力。我们的定量和定性分析表明,在特定条件下可以提供有价值的建议,并准备将其集成到重命名的重构工具中。尽管如此,我们的结果也突出了需要在该领域进行进一步研究的实验方法的局限性。
Identifiers, such as method and variable names, form a large portion of source code. Therefore, low-quality identifiers can substantially hinder code comprehension. To support developers in using meaningful identifiers, several (semi-)automatic techniques have been proposed, mostly being data-driven (e.g. statistical language models, deep learning models) or relying on static code analysis. Still, limited empirical investigations have been performed on the effectiveness of such techniques for recommending developers with meaningful identifiers, possibly resulting in rename refactoring operations. We present a large-scale study investigating the potential of data-driven approaches to support automated variable renaming. We experiment with three state-of-the-art techniques: a statistical language model and two DL-based models. The three approaches have been trained and tested on three datasets we built with the goal of evaluating their ability to recommend meaningful variable identifiers. Our quantitative and qualitative analyses show the potential of such techniques that, under specific conditions, can provide valuable recommendations and are ready to be integrated in rename refactoring tools. Nonetheless, our results also highlight limitations of the experimented approaches that call for further research in this field.