Distributed mining of maximal frequent itemsets on a Data Grid system

被引:8
|
作者
Luo, Congnan [1 ]
Pereira, Anil L. [1 ]
Chung, Soon M. [1 ]
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
来源
JOURNAL OF SUPERCOMPUTING | 2006年 / 37卷 / 01期
关键词
Data Grid; distributed data mining; maximal frequent itemsets; association rules; scalability;
D O I
10.1007/s11227-006-5210-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its supersets is frequent. GridDMM is specifically suitable for use in Grid environments due to low communication and synchronization overhead. GridDMM consists of a local mining phase and a global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. We built a Data Grid system on a cluster of workstations using the open-source Globus Toolkit, and evaluated the GridDMM algorithm in terms of performance, scalability, and the overhead of communication and synchronization. GridDMM demonstrates better performance than other sequential and parallel algorithms, and its performance is scalable in terms of the database size and the number of nodes.
引用
收藏
页码:71 / 90
页数:20
相关论文
共 50 条
  • [1] Distributed Mining of Maximal Frequent Itemsets on a Data Grid System
    Congnan Luo
    Anil L. Pereira
    Soon M. Chung
    [J]. The Journal of Supercomputing, 2006, 37 : 71 - 90
  • [2] Mining maximal frequent itemsets from data streams
    Mao, Guojun
    Wu, Xindong
    Zhu, Xingquan
    Chen, Gong
    Liu, Chunnian
    [J]. JOURNAL OF INFORMATION SCIENCE, 2007, 33 (03) : 251 - 262
  • [3] Efficiently mining maximal frequent itemsets
    Gouda, K
    Zaki, MJ
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 163 - 170
  • [4] On Maximal Frequent Itemsets Mining with Constraints
    Jabbour, Said
    Mana, Fatima Ezzahra
    Dlala, Imen Ouled
    Raddaoui, Badran
    Sais, Lakhdar
    [J]. PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, 2018, 11008 : 554 - 569
  • [5] New Policy of Maximal Frequent Itemsets in Data Stream Mining
    Xu, ChongHuan
    Ju, ChunHua
    [J]. ADVANCED MECHANICAL ENGINEERING, PTS 1 AND 2, 2010, 26-28 : 118 - +
  • [6] A novel approach for data stream maximal frequent itemsets mining
    [J]. Xu, Chong-Huan (talentxch@163.com), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (10):
  • [7] Distributed mining of maximal frequent itemsets from Databases on a cluster of workstations
    Chung, SM
    Luo, CN
    [J]. 2004 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID - CCGRID 2004, 2004, : 499 - 507
  • [8] Mining maximal frequent itemsets with frequent pattern list
    Qian, Jin
    Ye, Feiyue
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2007, : 628 - 632
  • [9] Distributed Frequent Closed Itemsets Mining
    Liu, Chun
    Zheng, Zheng
    Cai, Kai-Yuan
    Zhang, Shichao
    [J]. SITIS 2007: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGIES & INTERNET BASED SYSTEMS, 2008, : 43 - 50
  • [10] Mining maximal frequent itemsets for intrusion detection
    Wang, H
    Li, QH
    Xiong, HY
    Jiang, SY
    [J]. GRID AND COOPERATIVE COMPUTING GCC 2004 WORKSHOPS, PROCEEDINGS, 2004, 3252 : 422 - 429