论文标题
数据系列相似性搜索的Lernaean Hydra:对最新状态的实验评估
The Lernaean Hydra of Data Series Similarity Search: An Experimental Evaluation of the State of the Art
论文作者
论文摘要
越来越大的数据系列集合在许多不同的领域和应用中变得司空见惯。数据系列分析中的关键操作是相似性搜索,在过去的二十年中,它吸引了很多关注和精力。即使文献中提出了几种相关方法,但现有的研究都没有提供针对可用替代方案的详细评估。缺乏比较结果会因术语的非标准使用而进一步加剧,这导致了混乱和误解。在本文中,我们为过去研究过的相似性搜索的不同风味提供了定义,并介绍了数据系列相似性搜索技术的第一个系统的实验评估。根据实验结果,我们描述了每种方法的优势和劣势,并建议在典型用例下使用最佳使用方法。最后,通过确定每种方法的缺点,我们的发现为该领域的进一步发展奠定了基础。
Increasingly large data series collections are becoming commonplace across many different domains and applications. A key operation in the analysis of data series collections is similarity search, which has attracted lots of attention and effort over the past two decades. Even though several relevant approaches have been proposed in the literature, none of the existing studies provides a detailed evaluation against the available alternatives. The lack of comparative results is further exacerbated by the non-standard use of terminology, which has led to confusion and misconceptions. In this paper, we provide definitions for the different flavors of similarity search that have been studied in the past, and present the first systematic experimental evaluation of the efficiency of data series similarity search techniques. Based on the experimental results, we describe the strengths and weaknesses of each approach and give recommendations for the best approach to use under typical use cases. Finally, by identifying the shortcomings of each method, our findings lay the ground for solid further developments in the field.