论文标题

盆景:对双重重复数据删除

Bonsai: A Generalized Look at Dual Deduplication

论文作者

Sehat, Hadi, Kloborg, Anders Lindskov, Mørup, Christian, Pagnin, Elena, Lucani, Daniel E.

论文摘要

云服务提供商(CSP)以有竞争力的价格提供大量的存储空间,以应付对数字数据存储的不断增长的需求。双重重复数据删除是一个最新的框架,旨在改善CSP上的数据压缩,同时使客户数据私密。为了实现这一目标,客户在上传之前对其数据进行轻巧的信息理论转换。我们研究了双重重复数据删除的有效性,并提出了对现有最新方法的改进。我们命名盆景提案,旨在降低存储指纹并提高可扩展性。详细说明,盆景实现了(1)客户存储的大幅减少,(2)减少所需存储(客户端 + CSP),以及(3)减少CSP上的重复数据删除时间。我们的实验表明,盆景在云上达到了68 \%的压缩率,在客户端上达到5 \%,同时允许云以时间效率的方式识别重复数据删除。我们还表明,与仅应用通用压缩机或普通盆景相比,将我们的方法与云中的通用压缩机相结合,例如Br​​otli,可以在数据上产生更好的总体压缩。最后,我们表明盆景及其变体为了解客户原始数据的分布提供了足够的隐私,可为诚实而有趣的CPS提供了私密性。

Cloud Service Providers (CSPs) offer a vast amount of storage space at competitive prices to cope with the growing demand for digital data storage. Dual deduplication is a recent framework designed to improve data compression on the CSP while keeping clients' data private from the CSP. To achieve this, clients perform lightweight information-theoretic transformations to their data prior to upload. We investigate the effectiveness of dual deduplication, and propose an improvement for the existing state-of-the-art method. We name our proposal Bonsai as it aims at reducing storage fingerprint and improving scalability. In detail, Bonsai achieves (1) significant reduction in client storage, (2) reduction in total required storage (client + CSP), and (3) reducing the deduplication time on the CSP. Our experiments show that Bonsai achieves compression rates of 68\% on the cloud and 5\% on the client, while allowing the cloud to identify deduplications in a time-efficient manner. We also show that combining our method with universal compressors in the cloud, e.g., Brotli, can yield better overall compression on the data compared to only applying the universal compressor or plain Bonsai. Finally, we show that Bonsai and its variants provide sufficient privacy against an honest-but-curious CPS that knows the distribution of the Clients' original data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源