论文标题

分布式编码存储系统中低潜伏期的最佳缓存

Optimal Caching for Low Latency in Distributed Coded Storage Systems

论文作者

Liu, Kaiyang, Peng, Jun, Wang, Jingrong, Pan, Jianping

论文摘要

擦除代码已被广泛认为是在低存储成本下增强数据可靠性的有前途解决方案。但是,在现代地理分布的存储系统中,擦除代码可能会产生高数据访问延迟,因为它们需要从多个远程存储节点中检索数据。这阻碍了将擦除代码广泛应用于数据密集型应用程序。本文提出了新型的缓存方案,以在分布式编码的存储系统中达到低潜伏期。基于亚马逊简单存储服务的实验证实了数据检索的延迟与物理距离之间的正相关。使用平均数据访问延迟的性能指标来量化缓存的好处。假设可以使用未来的数据受欢迎程度和网络延迟信息,则提出了一个离线缓存方案来找到最佳的缓存解决方案。在最佳方案的指导下,根据测量的数据流行度和网络延迟信息实时提出了在线缓存方案。实验结果表明,在线方案可以通过大幅度降低的计算复杂性来很好地近似最佳方案。

Erasure codes have been widely considered a promising solution to enhance data reliability at low storage costs. However, in modern geo-distributed storage systems, erasure codes may incur high data access latency as they require data retrieval from multiple remote storage nodes. This hinders the extensive application of erasure codes to data-intensive applications. This paper proposes novel caching schemes to achieve low latency in distributed coded storage systems. Experiments based on Amazon Simple Storage Service confirm the positive correlation between the latency and the physical distance of data retrieval. The average data access latency is used the performance metric to quantify the benefits of caching. Assuming that the future data popularity and network latency information is available, an offline caching scheme is proposed to find the optimal caching solution. Guided by the optimal scheme, an online caching scheme is proposed according to the measured data popularity and network latency information in real time. Experiment results demonstrate that the online scheme can approximate the optimal scheme well with dramatically reduced computation complexity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源