论文标题
解决最后一级缓存的重复使用预测的可变性
Addressing Variability in Reuse Prediction for Last-Level Caches
论文作者
论文摘要
最后一级缓存(LLC)代表现代CPU处理器晶体管预算的大部分,对于应用程序性能至关重要,因为LLC可以快速访问数据,而与大量较慢的主内存相比。但是,具有较大工作组合的应用程序通常在LLC上显示流媒体和/或触及访问模式。结果,LLC容量的很大一部分被死的块占据,而死亡块将不会再次引用,从而导致LLC容量的效率低下。为了提高缓存效率,最先进的缓存管理技术采用了预测机制,这些机制从过去的访问模式中学习,目的是准确识别尽可能多的死块。一旦确定,将从LLC驱逐死块,以腾出空间,以实现潜在的高度使用缓存块。 在本论文中,我们确定缓存块的重复使用行为的可变性是最大化最先进的预测技术的高速缓存效率的关键限制因素。重复使用预测的可变性是不可避免的,这是由于许多因素超出了LLC的控制。可变性的来源包括共享缓存的内核的控制流变化,投机性执行和争论等。重用预测的可变性挑战了现有技术,以可靠地识别块的有用寿命的结束,从而导致较低的预测准确性,覆盖范围或两者兼而有之。为了应对这一挑战,本论文旨在在面对可变性的预测中设计强大的缓存管理机制和LLC的策略,以最大程度地减少高速缓存失误,同时保持硬件实现的成本和复杂性低。为此,我们提出了两种缓存管理技术,一种域 - 不可或缺的域和一个领域专业化,以通过解决重用预测的可变性来提高缓存效率。
Last-Level Cache (LLC) represents the bulk of a modern CPU processor's transistor budget and is essential for application performance as LLC enables fast access to data in contrast to much slower main memory. However, applications with large working set size often exhibit streaming and/or thrashing access patterns at LLC. As a result, a large fraction of the LLC capacity is occupied by dead blocks that will not be referenced again, leading to inefficient utilization of the LLC capacity. To improve cache efficiency, the state-of-the-art cache management techniques employ prediction mechanisms that learn from the past access patterns with an aim to accurately identify as many dead blocks as possible. Once identified, dead blocks are evicted from LLC to make space for potentially high reuse cache blocks. In this thesis, we identify variability in the reuse behavior of cache blocks as the key limiting factor in maximizing cache efficiency for state-of-the-art predictive techniques. Variability in reuse prediction is inevitable due to numerous factors that are outside the control of LLC. The sources of variability include control-flow variation, speculative execution and contention from cores sharing the cache, among others. Variability in reuse prediction challenges existing techniques in reliably identifying the end of a block's useful lifetime, thus causing lower prediction accuracy, coverage, or both. To address this challenge, this thesis aims to design robust cache management mechanisms and policies for LLC in the face of variability in reuse prediction to minimize cache misses, while keeping the cost and complexity of the hardware implementation low. To that end, we propose two cache management techniques, one domain-agnostic and one domain-specialized, to improve cache efficiency by addressing variability in reuse prediction.