论文标题
基于多组件键索引的快速K字邻近搜索的改进算法
An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes
论文作者
论文摘要
搜索查询由几个单词组成。在接近全文搜索中,我们希望找到包含这些单词彼此的文档。当查询由高频出现的单词组成时,此任务需要很多时间。如果我们无法通过将高频率的单词宣布为停止单词来避免使用此任务,那么我们可以通过引入其他索引以更快地执行来优化解决方案。在以前的工作中,我们讨论了如何使用多组件密钥索引减少搜索时间。我们已经证明,如果查询由高频出现的单词组成,则可以使用其他索引来提高平均查询执行时间的130倍。在本文中,我们提出了另一种搜索算法,该算法克服了我们以前的算法的某些局限性,并提供了更多的性能增益。 这是在Arai K.,Kapoor S.,Bhatia R.(Eds)智能系统和应用中发表的贡献的预印。 Intellisys 2020。智能系统和计算的进步,第1251卷,由Cham Springer出版。最终身份验证的版本可在线获得:https://doi.org/10.1007/978-3-3-030-55187-2_37
A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring words from consideration by declaring them as stop words, then we can optimize our solution by introducing additional indexes for faster execution. In a previous work, we discussed how to decrease the search time with multi-component key indexes. We had shown that additional indexes can be used to improve the average query execution time up to 130 times if queries consisted of high-frequently occurring words. In this paper, we present another search algorithm that overcomes some limitations of our previous algorithm and provides even more performance gain. This is a pre-print of a contribution published in Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251, published by Springer, Cham. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-55187-2_37