优先考虑文档工作：我们可以做得更好吗？

论文标题

优先考虑文档工作：我们可以做得更好吗？

Prioritizing documentation effort: Can we do better?

论文作者

Liu, Shiran, Guo, Zhaoqiang, Li, Yanhui, Lu, Hongmin, Chen, Lin, Xu, Lei, Zhou, Yuming, Xu, Baowen

论文摘要

代码文档对于软件质量保证至关重要，但是由于时间或经济压力，代码开发人员通常无法为项目中所有模块编写文档。最近，提出了一种有监督的人工神经网络（ANN）方法，以优先考虑重要的模块以进行文档工作。但是，作为一种有监督的方法，需要使用标记的培训数据来训练预测模型，这在实践中可能并不容易获得。此外，目前尚不清楚ANN方法是否可推广，因为它仅在几个小数据集上进行评估。在本文中，我们提出了一种基于Pagerank的无监督方法，以优先考虑文档工作。这种方法仅根据项目中模块之间的依赖关系来标识“重要”模块。结果，Pagerank方法不需要任何培训数据来构建预测模型。为了评估Pagerank方法的有效性，我们使用六个大型数据集除了从开源项目中收集的相同数据集外，还可以进行实验。实验结果表明，Pagerank方法在优先考虑文档工作的重要模块方面优先于最新的ANN方法。特别是，由于简单性和有效性，我们主张在未来的文档工作优先级研究中，应将Pagerank方法用作易于实现的基线，并应将任何新方法与IT进行比较以证明其有效性。

Code documentations are essential for software quality assurance, but due to time or economic pressures, code developers are often unable to write documents for all modules in a project. Recently, a supervised artificial neural network (ANN) approach is proposed to prioritize important modules for documentation effort. However, as a supervised approach, there is a need to use labeled training data to train the prediction model, which may not be easy to obtain in practice. Furthermore, it is unclear whether the ANN approach is generalizable, as it is only evaluated on several small data sets. In this paper, we propose an unsupervised approach based on PageRank to prioritize documentation effort. This approach identifies "important" modules only based on the dependence relationships between modules in a project. As a result, the PageRank approach does not need any training data to build the prediction model. In order to evaluate the effectiveness of the PageRank approach, we use six additional large data sets to conduct the experiments in addition to the same data sets collected from open-source projects as used in prior studies. The experimental results show that the PageRank approach is superior to the state-of-the-art ANN approach in prioritizing important modules for documentation effort. In particular, due to the simplicity and effectiveness, we advocate that the PageRank approach should be used as an easy-to-implement baseline in future research on documentation effort prioritization, and any new approach should be compared with it to demonstrate its effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题