论文标题
提取指导的说明预取诉讼
Fetch-Directed Instruction Prefetching Revisited
论文作者
论文摘要
先前的工作已经观察到,提取指导的预取(FDIP)在涵盖指令缓存失误方面非常有效。 FDIP有效性的关键是拥有足够大的BTB来适应应用程序的分支机构工作集。在这项工作中,我们介绍了几种优化,这些优化大大扩展了BTB在可用的存储预算内的影响范围。我们的优化目标几乎针对每个BTB条目中的每个存储开销源;即标签,目标地址和大小字段。 我们观察到,尽管大多数动态分支实例的偏移量短,但是大量分支的偏移量更长或需要使用完整的目标地址。基于此洞察力,我们将BTB分解为多个较小的BTB,每个BTB都会存储不同长度的偏移。这使目标地址的存储空间大大减少。我们进一步将标签压缩到16位,并避免使用先前的FDIP变体中主张的基本面向块的BTB。后一种优化消除了将基本块大小存储在每个BTB条目中的必要性。我们的最终设计(称为FDIP-X)使用了4个BTB的合奏,并且始终优于常规FDIP,其面向统一的基本基本式BTB用于均等的存储预算。
Prior work has observed that fetch-directed prefetching (FDIP) is highly effective at covering instruction cache misses. The key to FDIP's effectiveness is having a sufficiently large BTB to accommodate the application's branch working set. In this work, we introduce several optimizations that significantly extend the reach of the BTB within the available storage budget. Our optimizations target nearly every source of storage overhead in each BTB entry; namely, the tag, target address, and size fields. We observe that while most dynamic branch instances have short offsets, a large number of branches has longer offsets or requires the use of full target addresses. Based on this insight, we break-up the BTB into multiple smaller BTBs, each storing offsets of different length. This enables a dramatic reduction in storage for target addresses. We further compress tags to 16 bits and avoid the use of the basic-block-oriented BTB advocated in prior FDIP variants. The latter optimization eliminates the need to store the basic block size in each BTB entry. Our final design, called FDIP-X, uses an ensemble of 4 BTBs and always outperforms conventional FDIP with a unified basic-block-oriented BTB for equal storage budgets.