Effect of data skewness in parallel mining of association rules

被引：0

作者：

Cheung, DW ^{[1
]}

Xiao, YQ ^{[1
]}

机构：

[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong

来源：

RESEARCH AND DEVELOPMENT IN KNOWLEDGE DISCOVERY AND DATA MINING | 1998年 / 1394卷

关键词：

association rules; data mining; data skewness; parallel computing;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An efficient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very effective when data skewness is high. Global pruning is more effective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm [2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.

引用

页码：48 / 60

页数：13

共 50 条

[21] On data partitions for mining association rules
Han, JL
INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 1176 - 1182
[22] Efficient strategies for parallel mining class association rules
Dang Nguyen
Bay Vo
Bac Le
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (10) : 4716 - 4729
[23] Parallel mining of association rules from text databases
John D. Holt
Soon M. Chung
The Journal of Supercomputing, 2007, 39 : 273 - 299
[24] Hash based parallel algorithms for mining association rules
Shintani, T
Kitsuregawa, M
PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED INFORMATION SYSTEMS, 1996, : 19 - 30
[25] Using active networks in parallel mining of association rules
Ding, Q
Perrizo, W
INFORMATION REUSE AND INTEGRATION, 2000, : 58 - 61
[26] Parallel algorithms for mining association rules in large databases
Kudo, T
Ashihara, H
Shimizu, K
INTELLIGENT SYSTEMS, 1997, : 125 - 128
[27] Parallel mining of association rules from text databases
Holt, John D.
Chung, Soon M.
JOURNAL OF SUPERCOMPUTING, 2007, 39 (03): : 273 - 299
[28] On efficiency and data privacy level of association rules mining algorithms within parallel spatial data warehouse
Gorawski, Marcin
Stachurski, Karol
FIRST INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, PROCEEDINGS, 2006, : 936 - +
[29] Role of sampling in data mining for association rules
Jeragh, M
Mehrotra, KG
IC-AI'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS I-III, 2001, : 483 - 489
[30] Mining Multilevel Association Rules on RFID data
Kim, Younghee
Kim, Ungmo
2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2009, : 46 - 50

← 1 2 3 4 5 →