论文标题
为了更好地理解和评估Flash SSD上的树结构
Toward a Better Understanding and Evaluation of Tree Structures on Flash SSDs
论文作者
论文摘要
固态驱动器(SSD)广泛用于部署持续数据存储,因为它们提供了低延迟随机访问,高写入吞吐量,高数据密度和低成本。基于树的数据结构被广泛用于构建持续的数据存储,实际上它们位于当今生产和研究中使用的许多数据管理系统的骨干。在本文中,我们表明,基于SSD上的基于树的数据结构的基准是一个复杂的过程,它可能很容易引起微妙的陷阱,从而导致不准确的性能评估。在高水平的情况下,这些陷阱源于复杂硬件上运行的复杂软件的相互作用。一方面,树结构实施了对性能有非平凡影响的内部操作。另一方面,SSD采用固件逻辑来处理基础闪存的特质,众所周知,这会导致复杂的性能动态。我们使用RockSDB和Wiredtiger,分别是LSM-Tree和B+树的两个广泛实现,识别七个基准测试陷阱。我们表明,这种陷阱会导致对关键性能指标的测量不正确,阻碍了结果的可重复性和代表性,并导致生产环境中的次优部署。我们还提供有关如何避免这些陷阱以获得更可靠的性能测量以及在不同设计点之间进行更彻底和公平的比较的指南。
Solid-state drives (SSDs) are extensively used to deploy persistent data stores, as they provide low latency random access, high write throughput, high data density, and low cost. Tree-based data structures are widely used to build persistent data stores, and indeed they lie at the backbone of many of the data management systems used in production and research today. In this paper, we show that benchmarking a persistent tree-based data structure on an SSD is a complex process, which may easily incur subtle pitfalls that can lead to an inaccurate performance assessment. At a high-level, these pitfalls stem from the interaction of complex software running on complex hardware. On one hand, tree structures implement internal operations that have nontrivial effects on performance. On the other hand, SSDs employ firmware logic to deal with the idiosyncrasies of the underlying flash memory, which are well known to lead to complex performance dynamics. We identify seven benchmarking pitfalls using RocksDB and WiredTiger, two widespread implementations of an LSM-Tree and a B+Tree, respectively. We show that such pitfalls can lead to incorrect measurements of key performance indicators, hinder the reproducibility and the representativeness of the results, and lead to suboptimal deployments in production environments. We also provide guidelines on how to avoid these pitfalls to obtain more reliable performance measurements, and to perform more thorough and fair comparison among different design points.