论文标题
大量分布式链接数据的有效查询评估技术
Efficient query evaluation techniques over large amount of distributed linked data
论文作者
论文摘要
随着RDF变得更广泛地建立并且链接数据的数量正在迅速增加,对大量数据的有效查询成为一个重大挑战。在本文中,我们提出了一种以分布式方式查询大量链接数据的算法系列。这些查询评估算法独立于数据的存储方式以及查询评估的特定实现。然后,我们使用MapReduce范式提出这些算法的分布式实现,并实验评估它们,尽管这些算法可以直接转换为其他分布式处理框架。我们还研究并提出了用于改善分布式查询响应的总体性能的基本图形模式(SPARQL查询的子类)的多种查询分解方法。还提供了这些分解算法的有效性的深入分析。
As RDF becomes more widely established and the amount of linked data is rapidly increasing, the efficient querying of large amount of data becomes a significant challenge. In this paper, we propose a family of algorithms for querying large amount of linked data in a distributed manner. These query evaluation algorithms are independent of the way the data is stored, as well as of the particular implementation of the query evaluation. We then use the MapReduce paradigm to present a distributed implementation of these algorithms and experimentally evaluate them, although the algorithms could be straightforwardly translated into other distributed processing frameworks. We also investigate and propose multiple query decomposition approaches of Basic Graph Patterns (subclass of SPARQL queries) that are used to improve the overall performance of the distributed query answering. A deep analysis of the effectiveness of these decomposition algorithms is also provided.