多核处理器的连贯流量与不透明的分布式目录

论文标题

多核处理器的连贯流量与不透明的分布式目录

Coherence Traffic in Manycore Processors with Opaque Distributed Directories

论文作者

Kommrusch, Steve, Horro, Marcos, Pouchet, Louis-Noël, Rodríguez, Gabriel, Touriño, Juan

论文摘要

许多核处理器具有大量的通用核心，该核心旨在以多线程的方式工作。使用可扩展的分布式目录保持最新的多核处理器。一个最重要的示例是英特尔网格互连，它由芯片上的网络互连“瓷砖”组成，每个互连都包含计算核心，本地缓存和相干性掌握。必须查询分布式的连贯子系统，以对每个易于的访问进行查询，并在内存延迟上施加开销。本文研究了Intel骑士登陆处理器的物理布局，特别关注连贯子系统，并发现了分布式目录的各个部分的物理记忆块的伪随机映射函数。利用这些知识，研究了候选优化，以通过最小化连贯性流量来提高记忆潜伏期。尽管这些优化确实可以改善内存吞吐量，但最终由于映射函数的计算复杂性而引起的固有的开销，这并不能转化为性能增长。

Manycore processors feature a high number of general-purpose cores designed to work in a multithreaded fashion. Recent manycore processors are kept coherent using scalable distributed directories. A paramount example is the Intel Mesh interconnect, which consists of a network-on-chip interconnecting "tiles", each of which contains computation cores, local caches, and coherence masters. The distributed coherence subsystem must be queried for every out-of-tile access, imposing an overhead on memory latency. This paper studies the physical layout of an Intel Knights Landing processor, with a particular focus on the coherence subsystem, and uncovers the pseudo-random mapping function of physical memory blocks across the pieces of the distributed directory. Leveraging this knowledge, candidate optimizations to improve memory latency through the minimization of coherence traffic are studied. Although these optimizations do improve memory throughput, ultimately this does not translate into performance gains due to inherent overheads stemming from the computational complexity of the mapping functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题