论文标题
TKUS:挖掘顶级高级序列模式
TKUS: Mining Top-K High-Utility Sequential Patterns
论文作者
论文摘要
高实时的顺序模式采矿(HUSPM)最近出现了强烈的研究兴趣的重点。 HUSPM的主要任务是在定量的顺序数据库中找到所有相对于用户定义的最小效用阈值的实用程序。但是,很难指定最低效用阈值,尤其是当在大多数情况下看不见的数据库功能尚不清楚时。为了解决这个问题,提出了TOP-K HUSPM。到目前为止,仅进行了非常初步的工作来捕获Top-K壳,并且现有策略需要改善运行时间,记忆消耗,毫无疑问的候选人过滤和可扩展性。此外,没有定义系统的问题陈述。在本文中,我们提出了TOP-K HUSPM的问题,并提出了一种称为TKUS的新型算法。为了提高效率,TKU采用了预测和本地搜索机制,并采用了多种方案,包括序列实用程序提高,尽早终止后代并消除无宣传的项目策略,从而可以大大降低搜索空间。最后,实验结果表明,与最先进的算法Tkhus-Span相比,TKU可以实现足够良好的TOP-K HUSPM性能。
High-utility sequential pattern mining (HUSPM) has recently emerged as a focus of intense research interest. The main task of HUSPM is to find all subsequences, within a quantitative sequential database, that have high utility with respect to a user-defined minimum utility threshold. However, it is difficult to specify the minimum utility threshold, especially when database features, which are invisible in most cases, are not understood. To handle this problem, top-k HUSPM was proposed. Up to now, only very preliminary work has been conducted to capture top-k HUSPs, and existing strategies require improvement in terms of running time, memory consumption, unpromising candidate filtering, and scalability. Moreover, no systematic problem statement has been defined. In this paper, we formulate the problem of top-k HUSPM and propose a novel algorithm called TKUS. To improve efficiency, TKUS adopts a projection and local search mechanism and employs several schemes, including the Sequence Utility Raising, Terminate Descendants Early, and Eliminate Unpromising Items strategies, which allow it to greatly reduce the search space. Finally, experimental results demonstrate that TKUS can achieve sufficiently good top-k HUSPM performance compared to state-of-the-art algorithm TKHUS-Span.