论文标题
通过在线划分的枚举双簇算法的新进展
New advances in enumerative biclustering algorithms with online partitioning
论文作者
论文摘要
本文进一步扩展了RIN-CLOSE_CVC,这是一种能够对最大双群的高效,完整,正确和非冗余的枚举,具有数值数据集中列的恒定值。通过避免对数据集的先验分区和逐项逐项列出,RIN-CLOSE_CVC实现了在线分区,此处证明,该分区可以指导更有用的双簇结果。改进的算法称为rin-close_cvc3,保留了Rin-close_cvc的那些有吸引力的属性,如下所示,其特征是:记忆使用情况的急剧降低;运行时的一致增益;处理缺少值的数据集的其他能力;以及具有以不同分布甚至混合数据类型为特征的属性运行的其他能力。实验结果包括用于执行可伸缩性和灵敏度分析的合成和现实数据集。作为一项实际案例研究,在监督的描述模式挖掘的背景下,获得了一套简约的相关和可解释的混合属性型规则。
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. By avoiding a priori partitioning and itemization of the dataset, RIn-Close_CVC implements an online partitioning, which is demonstrated here to guide to more informative biclustering results. The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, as formally proved here, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime; additional ability to handle datasets with missing values; and additional ability to operate with attributes characterized by distinct distributions or even mixed data types. The experimental results include synthetic and real-world datasets used to perform scalability and sensitivity analyses. As a practical case study, a parsimonious set of relevant and interpretable mixed-attribute-type rules is obtained in the context of supervised descriptive pattern mining.