论文标题

使用预测切片的语义预取

Semantic prefetching using forecast slices

论文作者

Peled, Leeor, Weiser, Uri, Etsion, Yoav

论文摘要

现代预摘要确定内存访问模式,以预测未来的访问。但是,许多应用都表现出不规则的访问模式,这些模式在内存地址空间中不显示时空位置。此类应用程序通常不属于现有的预取技术的范围,该技术仅观察内存单元派遣的地址流,而不是产生它们的代码流。同样,时间相关预摘要检测到访问之间的重复关系,但不会跟踪表现出记忆位置的程序代码中的因果关系链。相反,代码感知的技术仅限于基本程序功能,并且受机器深度的界限。在本文中,我们表明,对生成内存访问的代码流的上下文分析可以检测重复的代码模式,并使它们的基本语义甚至用于不规则访问模式。此外,可以使用程序局部伪像来增强内存遍历代码并预测未来的访问。我们介绍了在运行时分析程序并了解其内存依赖链并解决计算流的语义预摘要。然后,预摘要构建了预测切片,并将其注入关键点,以触发及时预取有关上下文相关的迭代。我们展示了这种方法如何兼顾两全其美,通过预测功能增强了代码注入,并依靠基于上下文的代码切片的时间相关性。这种组合使我们能够克服当前任何其他预定器当前未涵盖的关键记忆潜伏期。我们使用工业级,周期精确的X86模拟器对语义预摘要的评估表明,它平均比Spec 2006(最高3.7倍)的平均性能提高了24%,平均SPEC2017(远距离群体最高比例为1.85x),仅使用〜6kb。

Modern prefetchers identify memory access patterns in order to predict future accesses. However, many applications exhibit irregular access patterns that do not manifest spatio-temporal locality in the memory address space. Such applications usually do not fall under the scope of existing prefetching techniques, which observe only the stream of addresses dispatched by the memory unit but not the code flows that produce them. Similarly, temporal correlation prefetchers detect recurring relations between accesses, but do not track the chain of causality in program code that manifested the memory locality. Conversely, techniques that are code-aware are limited to the basic program functionality and are bounded by the machine depth. In this paper we show that contextual analysis of the code flows that generate memory accesses can detect recurring code patterns and expose their underlying semantics even for irregular access patterns. Moreover, program locality artifacts can be used to enhance the memory traversal code and predict future accesses. We present the semantic prefetcher that analyzes programs at run-time and learns their memory dependency chains and address calculation flows. The prefetcher then constructs forecast slices and injects them at key points to trigger timely prefetching of future contextually-related iterations. We show how this approach takes the best of both worlds, augmenting code injection with forecast functionality and relying on context-based temporal correlation of code slices. This combination allows us to overcome critical memory latencies that are currently not covered by any other prefetcher. Our evaluation of the semantic prefetcher using an industrial-grade, cycle-accurate x86 simulator shows that it improves performance by 24% on average over SPEC 2006 (outliers up to 3.7x), and 16% on average over SPEC 2017 (outliers up to 1.85x), using only ~6KB.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源