Distributed mining of maximal frequent itemsets on a Data Grid system

被引:8
|
作者
Luo, Congnan [1 ]
Pereira, Anil L. [1 ]
Chung, Soon M. [1 ]
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
来源
JOURNAL OF SUPERCOMPUTING | 2006年 / 37卷 / 01期
关键词
Data Grid; distributed data mining; maximal frequent itemsets; association rules; scalability;
D O I
10.1007/s11227-006-5210-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its supersets is frequent. GridDMM is specifically suitable for use in Grid environments due to low communication and synchronization overhead. GridDMM consists of a local mining phase and a global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. We built a Data Grid system on a cluster of workstations using the open-source Globus Toolkit, and evaluated the GridDMM algorithm in terms of performance, scalability, and the overhead of communication and synchronization. GridDMM demonstrates better performance than other sequential and parallel algorithms, and its performance is scalable in terms of the database size and the number of nodes.
引用
收藏
页码:71 / 90
页数:20
相关论文
共 50 条
  • [21] Mining maximal frequent itemsets in data streams based on FP-Tree
    Ao, Fujiang
    Yan, Yuejin
    Huang, Jian
    Huang, Kedi
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 479 - +
  • [22] Approximate mining of maximal frequent itemsets in data streams with different window models
    Li, Hua-Fu
    Lee, Suh-Yin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) : 781 - 789
  • [23] Maximal and closed frequent itemsets mining from uncertain database and data stream
    Momtaz, Maliha
    Ferdaus, Abu Ahmed
    Ahmed, Chowdhury Farhan
    Samiullah, Mohammad
    [J]. International Journal of Data Science, 2019, 4 (03): : 237 - 259
  • [24] Mining Recent Maximal Frequent Itemsets Over Data Streams with Sliding Window
    Cai, Saihua
    Hao, Shangbo
    Sun, Ruizhi
    Wu, Gang
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (06) : 961 - 969
  • [25] Parallel mining of maximal frequent itemsets in PC clusters
    Veng, Vong Chan
    [J]. IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 480 - 485
  • [26] MMFI: An effective algorithm for mining maximal frequent itemsets
    Ju, Shiguang
    Chen, Chen
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 144 - 148
  • [27] Mining frequent itemsets in distributed and dynamic databases
    Otey, ME
    Wang, C
    Parthasarathy, S
    Veloso, A
    Meira, W
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 617 - 620
  • [28] A novel pruning technique for mining maximal frequent itemsets
    Ao, Fujiang
    Yan, Yuejin
    Huang, Jian
    Huang, Kedi
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 469 - +
  • [29] Parallel mining of maximal frequent itemsets from databases
    Chung, SM
    Luo, C
    [J]. 15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, : 134 - 139
  • [30] Efficiently mining maximal frequent itemsets based on digraph
    Ren, Zhibo
    Zhang, Qiang
    Ma, Xiujuan
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 140 - +