论文标题

VIP哈希 - 适应即时数据流行(扩展版)

VIP Hashing -- Adapting to Skew in Popularity of Data on the Fly (extended version)

论文作者

Kakaraparthy, Aarati, Patel, Jignesh M., Kroth, Brian P., Park, Kwanghyun

论文摘要

所有数据都不流行。通常,与其他数据相比,一部分数据的频率更高,这会导致数据项普及。适应这种偏斜可以提高性能,并且过去已经对基于磁盘的设置进行了广泛的研究。在这项工作中,我们考虑了一个内存数据结构,即哈希表,并展示了人们如何利用偏斜的普及才能获得更高的性能。哈希是一种低延迟的操作,对缓存,分支预测和代码复杂性的影响敏感。这些因素使学习尤其具有挑战性,因为执行其他操作的开销可能是重要的。在本文中,我们提出了一种完全在线哈希表方法的VIP哈希(VIP Hashing),该方法使用轻巧的机制来学习偏斜的知名度并调整哈希表布局。这些机制是非阻滞的,它们的开销是通过将受欢迎程度分布的变化转变为根据需要动态切换/关闭学习机制来控制的。 We tested VIP hashing against a variety of workloads generated by Wiscer, a homegrown hashing measurement tool, and find that it improves performance in the presence of skew (22% increase in fetch operation throughput for a hash table with one million keys under low skew, 77% increase under medium skew) while being robust to insert and delete operations, and changing popularity distribution of keys.我们发现,VIP哈希将TPC-H查询9的端到端执行时间减少了,这是最昂贵的TPC-H查询,在中等偏斜下的20%。

All data is not equally popular. Often, some portion of data is more frequently accessed than the rest, which causes a skew in popularity of the data items. Adapting to this skew can improve performance, and this topic has been studied extensively in the past for disk-based settings. In this work, we consider an in-memory data structure, namely hash table, and show how one can leverage the skew in popularity for higher performance. Hashing is a low-latency operation, sensitive to the effects of caching, branch prediction, and code complexity among other factors. These factors make learning in-the-loop especially challenging as the overhead of performing any additional operations can be significant. In this paper, we propose VIP hashing, a fully online hash table method, that uses lightweight mechanisms for learning the skew in popularity and adapting the hash table layout. These mechanisms are non-blocking, and their overhead is controlled by sensing changes in the popularity distribution to dynamically switch-on/off the learning mechanism as needed. We tested VIP hashing against a variety of workloads generated by Wiscer, a homegrown hashing measurement tool, and find that it improves performance in the presence of skew (22% increase in fetch operation throughput for a hash table with one million keys under low skew, 77% increase under medium skew) while being robust to insert and delete operations, and changing popularity distribution of keys. We find that VIP hashing reduces the end-to-end execution time of TPC-H query 9, which is the most expensive TPC-H query, by 20% under medium skew.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源