学习提高代码效率

论文标题

学习提高代码效率

Learning to Improve Code Efficiency

论文作者

Chen, Binghong, Tarlow, Daniel, Swersky, Kevin, Maas, Martin, Heiber, Pablo, Naik, Ashish, Hashemi, Milad, Ranganathan, Parthasarathy

论文摘要

由摩尔定律驱动的计算系统绩效的改善已经改变了社会。由于这种硬件驱动的收益放缓，对于软件开发人员而言，专注于开发过程中的性能和效率变得更加重要。尽管几项研究表明了这种提高的代码效率的潜力（例如，与硬件相比，2倍更好的世代改进），但在实践中解锁这些收益是充满挑战的。关于算法复杂性以及硬件编码模式的相互作用的推理对于普通程序员来说可能是具有挑战性的，尤其是当与开发速度和多人发展的务实约束结合使用时。本文旨在解决这个问题。我们从Google Code Jam竞争中分析了大型竞争性编程数据集，并发现有效的代码确实很少见，中位数和第90个百分位的解决方案之间的运行时间差异为2倍。我们建议使用机器学习以提示的形式自动提供规范反馈，以指导程序员编写高性能代码。为了自动从数据集中学习这些提示，我们提出了一种新颖的离散变量自动编码器，其中每个离散的潜在变量代表了不同的代码编辑类别，从而提高了性能。我们表明，此方法代表代码效率的多模式空间比序列到序列基线更好地编辑，并生成更有效的解决方案的分布。

Improvements in the performance of computing systems, driven by Moore's Law, have transformed society. As such hardware-driven gains slow down, it becomes even more important for software developers to focus on performance and efficiency during development. While several studies have demonstrated the potential from such improved code efficiency (e.g., 2x better generational improvements compared to hardware), unlocking these gains in practice has been challenging. Reasoning about algorithmic complexity and the interaction of coding patterns on hardware can be challenging for the average programmer, especially when combined with pragmatic constraints around development velocity and multi-person development. This paper seeks to address this problem. We analyze a large competitive programming dataset from the Google Code Jam competition and find that efficient code is indeed rare, with a 2x runtime difference between the median and the 90th percentile of solutions. We propose using machine learning to automatically provide prescriptive feedback in the form of hints, to guide programmers towards writing high-performance code. To automatically learn these hints from the dataset, we propose a novel discrete variational auto-encoder, where each discrete latent variable represents a different learned category of code-edit that increases performance. We show that this method represents the multi-modal space of code efficiency edits better than a sequence-to-sequence baseline and generates a distribution of more efficient solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题