Scalability achievements for enumerative biclustering with online partitioning: Case studies involving mixed-attribute datasets

被引：3

作者：

Veroneze, Rosana ^{[1
]}

Von Zuben, Fernando J. ^{[1
]}

机构：

[1] Univ Campinas DCA FEEC, 400 Atbert Einstein St, Campinas, SP, Brazil

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2021年 / 100卷

基金：

巴西圣保罗研究基金会;

关键词：

Enumerative biclustering; Online partitioning of numerical datasets; Efficient enumeration; Quantitative class association rules; Supervised descriptive pattern mining; Mixed-attribute datasets; INTERESTINGNESS MEASURES; PATTERN; ALGORITHM;

D O I：

10.1016/j.engappai.2020.104147

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Biclustering is a powerful data analysis technique and its concept is appealing in many domains, such as natural sciences and market basket analysis. To exemplify the wide range of biclustering applications, we can also mention recommender systems, educational data mining, emerging topic detection and counterfeit product detection. In this paper, we further extend RIn-Close_CVC, a biclustering algorithm capable of performing, in numerical datasets, an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns. By avoiding a priori partitioning and itemization of the dataset, RIn-Close_CVC implements an online partitioning, which is demonstrated here to guide to more informative biclustering results. The improved algorithm, called RIn-Close_CVC3, is characterized by: a drastic reduction in memory usage; a consistent gain in runtime; additional ability to handle datasets with missing values; and new skills to operate with attributes characterized by distinct distributions or even mixed data types. Moreover, RIn-Close_CVC3 keeps those four attractive properties of RIn-Close_CVC, as formally proved here. The experimental results include synthetic and real-world datasets used to perform scalability and sensitivity analyses, besides a comparative inquiry involving a priori and online partitioning. As a practical case study, a parsimonious set of relevant and interpretable mixed-attribute-type rules is obtained in the context of supervised descriptive pattern mining.

引用

页数：21