分析建模多核共享缓存行为，并考虑数据共享和连贯性

论文标题

分析建模多核共享缓存行为，并考虑数据共享和连贯性

Analytical Modeling the Multi-Core Shared Cache Behavior with Considerations of Data-Sharing and Coherence

论文作者

Ling, Ming, Lu, Xiaoqian, Wang, Guangmin, Ge, Jiancong

论文摘要

为了减轻恶化的“电力墙”和“记忆墙”问题，具有多层高速缓存层次结构的多核体系结构已被广泛接受。但是，体系结构的复杂性使得共享库的建模极为复杂。在本文中，我们提出了一个数据共享意识分析模型，以估计多核场景下下游共享缓存的错率。此外，提出的模型还可以与上游缓存分析模型集成在一起，并考虑多核私有缓存相干效应。这种集成避免了对常规方法所需的缓存架构的完整模拟。我们根据PARSEC 2.1基准套件的13个应用中的GEM5仿真结果验证了我们的分析模型。与8个硬件配置（包括双核和四核架构）的GEM5模拟的结果相比，所有配置的预测共享L2 Cache Miss率的平均绝对误差均小于2％。与相干遗漏的精制上游模型集成在一起后，由于误差累积，4个硬件配置中的总体平均绝对误差将降低至8.03％。提出的相干模型只能以第十次开销来实现类似的艺术方法的准确性。作为集成模型的应用程序案例，我们还评估了57种不同的多核和多级高速缓存配置的错率。

To mitigate the ever worsening "Power wall" and "Memory wall" problems, multi-core architectures with multilevel cache hierarchies have been widely accepted in modern processors. However, the complexity of the architectures makes modeling of shared caches extremely complex. In this paper, we propose a data-sharing aware analytical model for estimating the miss rates of the downstream shared cache under multi-core scenarios. Moreover, the proposed model can also be integrated with upstream cache analytical models with the consideration of multi-core private cache coherent effects. This integration avoids time consuming full simulations of the cache architecture that required by conventional approaches. We validate our analytical model against gem5 simulation results under 13 applications from PARSEC 2.1 benchmark suites. Compared to the results from gem5 simulations under 8 hardware configurations including dual-core and quad-core architectures, the average absolute error of the predicted shared L2 cache miss rates is less than 2% for all configurations. After integrated with the refined upstream model with coherence misses, the overall average absolute error in 4 hardware configurations is degraded to 8.03% due to the error accumulations. The proposed coherence model can achieve similar accuracies of state of the art approach with only one tenth time overhead. As an application case of the integrated model, we also evaluate the miss rates of 57 different multi-core and multi-level cache configurations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题