Effect of data skewness in parallel mining of association rules

被引:0
|
作者
Cheung, DW [1 ]
Xiao, YQ [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong
关键词
association rules; data mining; data skewness; parallel computing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An efficient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very effective when data skewness is high. Global pruning is more effective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm [2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.
引用
收藏
页码:48 / 60
页数:13
相关论文
共 50 条
  • [21] On data partitions for mining association rules
    Han, JL
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 1176 - 1182
  • [22] Efficient strategies for parallel mining class association rules
    Dang Nguyen
    Bay Vo
    Bac Le
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (10) : 4716 - 4729
  • [23] Parallel mining of association rules from text databases
    John D. Holt
    Soon M. Chung
    The Journal of Supercomputing, 2007, 39 : 273 - 299
  • [24] Hash based parallel algorithms for mining association rules
    Shintani, T
    Kitsuregawa, M
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED INFORMATION SYSTEMS, 1996, : 19 - 30
  • [25] Using active networks in parallel mining of association rules
    Ding, Q
    Perrizo, W
    INFORMATION REUSE AND INTEGRATION, 2000, : 58 - 61
  • [26] Parallel algorithms for mining association rules in large databases
    Kudo, T
    Ashihara, H
    Shimizu, K
    INTELLIGENT SYSTEMS, 1997, : 125 - 128
  • [27] Parallel mining of association rules from text databases
    Holt, John D.
    Chung, Soon M.
    JOURNAL OF SUPERCOMPUTING, 2007, 39 (03): : 273 - 299
  • [28] On efficiency and data privacy level of association rules mining algorithms within parallel spatial data warehouse
    Gorawski, Marcin
    Stachurski, Karol
    FIRST INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, PROCEEDINGS, 2006, : 936 - +
  • [29] Role of sampling in data mining for association rules
    Jeragh, M
    Mehrotra, KG
    IC-AI'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS I-III, 2001, : 483 - 489
  • [30] Mining Multilevel Association Rules on RFID data
    Kim, Younghee
    Kim, Ungmo
    2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2009, : 46 - 50