从不确定的数据中发现高公用事业 - 占用模式

论文标题

从不确定的数据中发现高公用事业 - 占用模式

Discovering High Utility-Occupancy Patterns from Uncertain Data

论文作者

Chen, Chien-Ming, Chen, Lili, Gan, Wensheng, Qiu, Lina, Ding, Weiping

论文摘要

众所周知，大数据中隐藏着许多有用的信息，导致了新的说法“数据就是钱”。因此，个人在许多现实世界应用中挖掘至关重要的信息是普遍存在的。过去，研究考虑了频率。不幸的是，这样做会忽略其他方面，例如效用，利息或风险。因此，在交易数据库中发现高维修度项目集（HUI）是明智的，同时不仅利用数量，而且还利用预定义的效用。为了找到可以代表支持交易的模式，最近进行了一项研究，以挖掘高公用事业 - 占用模式，该模式对整个交易的效用的贡献大于一定值。而且，在现实的应用中，交易中可能不存在模式，而是与存在概率相关的。在本文中，提出了一种新型算法，即在不确定数据库（UHUOPM）中称为高实用性占用模式挖掘。算法发现的模式称为潜在的高效用占用模式（Phuops）。该算法将用户偏好分为三个因素，包括支持，概率和公用事业占用。为了降低记忆成本和时间消耗，并在上述算法中修剪搜索空间，如上所述，使用了概率 - 实用性占用列表（PUO-LIST）和概率 - 频率 - 实用性表（PFU-table），这有助于提供向下关闭属性。此外，一种原始的树结构称为支撑伯爵树（SC-Tree），构造为算法的搜索空间。最后，进行了大量实验，以评估现实生活和合成数据集对所提出的UHUOPM算法的性能，尤其是在有效性和效率方面。

It is widely known that there is a lot of useful information hidden in big data, leading to a new saying that "data is money." Thus, it is prevalent for individuals to mine crucial information for utilization in many real-world applications. In the past, studies have considered frequency. Unfortunately, doing so neglects other aspects, such as utility, interest, or risk. Thus, it is sensible to discover high-utility itemsets (HUIs) in transaction databases while utilizing not only the quantity but also the predefined utility. To find patterns that can represent the supporting transaction, a recent study was conducted to mine high utility-occupancy patterns whose contribution to the utility of the entire transaction is greater than a certain value. Moreover, in realistic applications, patterns may not exist in transactions but be connected to an existence probability. In this paper, a novel algorithm, called High-Utility-Occupancy Pattern Mining in Uncertain databases (UHUOPM), is proposed. The patterns found by the algorithm are called Potential High Utility Occupancy Patterns (PHUOPs). This algorithm divides user preferences into three factors, including support, probability, and utility occupancy. To reduce memory cost and time consumption and to prune the search space in the algorithm as mentioned above, probability-utility-occupancy list (PUO-list) and probability-frequency-utility table (PFU-table) are used, which assist in providing the downward closure property. Furthermore, an original tree structure, called support count tree (SC-tree), is constructed as the search space of the algorithm. Finally, substantial experiments were conducted to evaluate the performance of proposed UHUOPM algorithm on both real-life and synthetic datasets, particularly in terms of effectiveness and efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题