论文标题
深度数据流分析
Deep Data Flow Analysis
论文作者
论文摘要
编译器架构师在构建启发式编译器优化时越来越多地寻找机器学习。自动启发式设计的承诺,使编译器工程师摆脱了程序,体系结构和其他优化的复杂交互,这是诱人的。但是,大多数机器学习方法甚至无法复制数据流分析的抽象解释中最简单的解释,这些解释对于做出良好的优化决策至关重要。这对于机器学习必须改变,以成为编译器启发式方法中的主要技术。 为此,我们提出了机器学习的程序图 - 与语言无关的,可移植的整体语义语义表示深度学习。为了进行编译器分析的基准当前和将来的学习技术,我们引入了一个针对LLVM的461K中间表示(IR)文件的开放数据集,涵盖了五种源编程语言,1540万个相应的数据流结果。我们将数据流量分析作为MPNN进行了表明,并表明,使用程序L可以学习标准分析,从而在下游编译器优化任务上提高了性能。
Compiler architects increasingly look to machine learning when building heuristics for compiler optimization. The promise of automatic heuristic design, freeing the compiler engineer from the complex interactions of program, architecture, and other optimizations, is alluring. However, most machine learning methods cannot replicate even the simplest of the abstract interpretations of data flow analysis that are critical to making good optimization decisions. This must change for machine learning to become the dominant technology in compiler heuristics. To this end, we propose ProGraML - Program Graphs for Machine Learning - a language-independent, portable representation of whole-program semantics for deep learning. To benchmark current and future learning techniques for compiler analyses we introduce an open dataset of 461k Intermediate Representation (IR) files for LLVM, covering five source programming languages, and 15.4M corresponding data flow results. We formulate data flow analysis as an MPNN and show that, using ProGraML, standard analyses can be learned, yielding improved performance on downstream compiler optimization tasks.