论文标题
COUP:冷页觉醒以克服基于STT-MRAM的I/O缓冲区中的保留失败
CoPA: Cold Page Awakening to Overcome Retention Failures in STT-MRAM Based I/O Buffers
论文作者
论文摘要
性能和可靠性是数据存储系统设计的两个突出因素。为了实现更高的性能,最近存储系统设计人员使用基于DRAM的缓冲区。 DRAM的波动率提高了数据丢失的可能性,因此,主要存储的一部分通常用作期刊区域,以便在电源故障的情况下恢复未灌输的数据页面。此外,定期冲洗到主要存储的缓冲数据页是保持高度可靠性的常见机制,从而导致存储写入流量增加。为了解决这一缺点,最近的研究提供了一个小的NVM,作为持续期刊区域(PJA)以及DRAM作为一种有效的方法,称为NVM支持的缓冲液(NVB-Buffer)。这种方法旨在解决针对电源故障的DRAM脆弱性,同时减少存储写入流量。在本文中,我们在新兴技术中使用了最有前途的PJA技术,这是STT-MRAM满足PJA(高耐力,非挥发性和类似DRAM的潜伏期)的要求。但是,STT-MRAM面临着主要的可靠性挑战,即保留失败,阅读干扰和写入失败。在本文中,我们首先表明,保留失败是NVB缓冲器中错误的主要来源,因为它具有长而不可预测的页面空闲间隔。然后,我们提出了一种新型的NVB履行管理方案,该方案名为Cold Page Awakening(Copa),可以预见地减少了PJA页面的空闲时间。为此,COPA采用遥远的刷新,通过在基于DRAM的缓冲区中使用其副本来定期覆盖脆弱的PJA页面内容。我们将COPA与基于物理日记的几个工作负载进行比较。我们的评估表明,使用COPA会导致三个数量级的失败率,而性能降解(1.1%)和内存开销(1.2%)。
Performance and reliability are two prominent factors in the design of data storage systems. To achieve higher performance, recently storage system designers use DRAM-based buffers. The volatility of DRAM brings up the possibility of data loss, so a part of the main storage is conventionally used as the journal area to be able of recovering unflushed data pages in the case of power failure. Moreover, periodically flushing buffered data pages to the main storage is a common mechanism to preserve a high level of reliability, which leads to an increase in storage write traffic. To address this shortcoming, recent studies offer a small NVM as the Persistent Journal Area (PJA) along with DRAM as an efficient approach, named NVM-Backed Buffer (NVB-Buffer). This approach aims to address DRAM vulnerability against power failure while reducing storage write traffic. In this paper, we use the most promising technologies for PJA among the emerging technologies, which is STT-MRAM to meet the requirements of PJA (high endurance, non-volatility, and DRAM-like latency). However, STT-MRAM faces major reliability challenges, i.e. Retention Failure, Read Disturbance, and Write Failure. In this paper, we first show that retention failure is the dominant source of errors in NVB-Buffers as it suffers from long and unpredictable page idle intervals. Then, we propose a novel NVB-Buffer management scheme, named, Cold Page Awakening (CoPA), which predictably reduces the idle time of PJA pages. To this aim, CoPA employs Distant Refreshing to periodically overwrite the vulnerable PJA page contents by using their replica in DRAM-based buffer. We compare CoPA with the state-of-the-art schemes over several workloads based on physical journaling. Our evaluations show that employing CoPA leads to three orders of magnitude lower failure rate with negligible performance degradation (1.1%) and memory overhead (1.2%).