Distributed mining of maximal frequent itemsets on a Data Grid system

被引：8

作者：

Luo, Congnan ^{[1
]}

Pereira, Anil L. ^{[1
]}

Chung, Soon M. ^{[1
]}

机构：

[1] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA

来源：

JOURNAL OF SUPERCOMPUTING | 2006年 / 37卷 / 01期

关键词：

Data Grid; distributed data mining; maximal frequent itemsets; association rules; scalability;

D O I：

10.1007/s11227-006-5210-7

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its supersets is frequent. GridDMM is specifically suitable for use in Grid environments due to low communication and synchronization overhead. GridDMM consists of a local mining phase and a global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. We built a Data Grid system on a cluster of workstations using the open-source Globus Toolkit, and evaluated the GridDMM algorithm in terms of performance, scalability, and the overhead of communication and synchronization. GridDMM demonstrates better performance than other sequential and parallel algorithms, and its performance is scalable in terms of the database size and the number of nodes.

引用

页码：71 / 90

页数：20

共 50 条

[21] Mining maximal frequent itemsets in data streams based on FP-Tree
Ao, Fujiang
Yan, Yuejin
Huang, Jian
Huang, Kedi
[J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 479 - +
[22] Approximate mining of maximal frequent itemsets in data streams with different window models
Li, Hua-Fu
Lee, Suh-Yin
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) : 781 - 789
[23] Maximal and closed frequent itemsets mining from uncertain database and data stream
Momtaz, Maliha
Ferdaus, Abu Ahmed
Ahmed, Chowdhury Farhan
Samiullah, Mohammad
[J]. International Journal of Data Science, 2019, 4 (03): : 237 - 259
[24] Mining Recent Maximal Frequent Itemsets Over Data Streams with Sliding Window
Cai, Saihua
Hao, Shangbo
Sun, Ruizhi
Wu, Gang
[J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (06) : 961 - 969
[25] Parallel mining of maximal frequent itemsets in PC clusters
Veng, Vong Chan
[J]. IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 480 - 485
[26] MMFI: An effective algorithm for mining maximal frequent itemsets
Ju, Shiguang
Chen, Chen
[J]. 2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 144 - 148
[27] Mining frequent itemsets in distributed and dynamic databases
Otey, ME
Wang, C
Parthasarathy, S
Veloso, A
Meira, W
[J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 617 - 620
[28] A novel pruning technique for mining maximal frequent itemsets
Ao, Fujiang
Yan, Yuejin
Huang, Jian
Huang, Kedi
[J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 469 - +
[29] Parallel mining of maximal frequent itemsets from databases
Chung, SM
Luo, C
[J]. 15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, : 134 - 139
[30] Efficiently mining maximal frequent itemsets based on digraph
Ren, Zhibo
Zhang, Qiang
Ma, Xiujuan
[J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 140 - +

← 1 2 3 4 5 →