论文标题

实践中二进制关系的压缩数据结构

Compressed Data Structures for Binary Relations in Practice

论文作者

Quijada-Fuentes, Carlos, Penabad, Miguel R., Ladra, Susana, Gutiérrez, Gilberto

论文摘要

二进制关系通常在计算机科学中用于建模数据。除了使用矩阵或列表的经典表示外,最近还提出了一些压缩数据结构来代表紧凑型空间中的二进制关系,例如$ k^2 $ - 树和二进制关系小波树(BRWT)。了解他们的存储需求,支持的操作和时间性能是实现给定域或应用程序,其数据分布以及通过数据计算的典型操作的适当选择数据表示的关键。 在这项工作中,我们对二元关系的几种压缩表示进行了经验比较。我们使用不同的(合成和真实)数据分布分析了它们的空间使用量和操作速度。我们包括邻里和集合操作,还提出了用于BRWT的设定操作算法,这些算法在文献中没有提出。我们得出的结论是,没有一个明确的选择要优于其余的选择,但是根据数据分配和对数据执行的操作类型的不同,我们提供了一些使用的建议。我们还包括对数据表示形式的可伸缩性研究。

Binary relations are commonly used in Computer Science for modeling data. In addition to classical representations using matrices or lists, some compressed data structures have recently been proposed to represent binary relations in compact space, such as the $k^2$-tree and the Binary Relation Wavelet Tree (BRWT). Knowing their storage needs, supported operations and time performance is key for enabling an appropriate choice of data representation given a domain or application, its data distribution and typical operations that are computed over the data. In this work, we present an empirical comparison among several compressed representations for binary relations. We analyze their space usage and the speed of their operations using different (synthetic and real) data distributions. We include both neighborhood and set operations, also proposing algorithms for set operations for the BRWT, which were not presented before in the literature. We conclude that there is not a clear choice that outperforms the rest, but we give some recommendations of usage of each compact representation depending on the data distribution and types of operations performed over the data. We also include a scalability study of the data representations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源