论文标题
程序L:基于图的深度学习,用于程序优化和分析
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis
论文作者
论文摘要
计算系统的复杂性日益复杂,给优化编译器带来了巨大的负担,需要更加准确,更具侵略性的优化。机器学习为构建优化启发式方法提供了重大好处,但是最先进的方法与最佳启发式启发式的性能之间仍然存在差距。缩小此差距需要在两个关键领域进行改进:准确捕获程序语义的表示形式,以及具有足够表现力的模型体系结构以推理这种表示。 我们介绍了用于机器学习的程序图 - 使用低级别,语言不可知论和便携式格式的基于图形的新型程序表示;和机器学习模型,能够在这些图表上执行复杂的下游任务。程序L表示形式是捕获控制,数据和呼叫关系的定向归因性多编码,并总结了指令和操作数类型和订购。消息传递神经网络通过此结构化表示形式传播信息,从而启用全程或vertex分类任务。 ProgramL提供了通用程序表示形式,该计划将可学习的模型以执行对优化至关重要的程序分析类型。为此,我们首先在一系列传统的编译器分析任务上评估了方法的性能:控制流量到达,统治者树,数据依赖性,可变的可变性和常见的亚表达检测。在涵盖六种源编程语言的250K LLVM-IR文件的基准数据集中,ProgramL平均达到94.0 F1分数,显着优于最先进的方法。然后,我们将方法应用于两个高级任务 - 异质设备映射和程序分类 - 在两者中都设置新的最新性能。
The increasing complexity of computing systems places a tremendous burden on optimizing compilers, requiring ever more accurate and aggressive optimizations. Machine learning offers significant benefits for constructing optimization heuristics but there remains a gap between what state-of-the-art methods achieve and the performance of an optimal heuristic. Closing this gap requires improvements in two key areas: a representation that accurately captures the semantics of programs, and a model architecture with sufficient expressiveness to reason about this representation. We introduce ProGraML - Program Graphs for Machine Learning - a novel graph-based program representation using a low level, language agnostic, and portable format; and machine learning models capable of performing complex downstream tasks over these graphs. The ProGraML representation is a directed attributed multigraph that captures control, data, and call relations, and summarizes instruction and operand types and ordering. Message Passing Neural Networks propagate information through this structured representation, enabling whole-program or per-vertex classification tasks. ProGraML provides a general-purpose program representation that equips learnable models to perform the types of program analysis that are fundamental to optimization. To this end, we evaluate the performance of our approach first on a suite of traditional compiler analysis tasks: control flow reachability, dominator trees, data dependencies, variable liveness, and common subexpression detection. On a benchmark dataset of 250k LLVM-IR files covering six source programming languages, ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches. We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.