A load-balanced distributed parallel mining algorithm

被引:20
|
作者
Yu, Kun-Ming [2 ]
Zhou, Jiayi [1 ]
Hong, Tzung-Pei [3 ]
Zhou, Jia-Ling [4 ]
机构
[1] Chung Hua Univ, Inst Engn & Sci, Hsinchu 300, Taiwan
[2] Chung Hua Univ, Dept Comp Sci & Informat Engn, Hsinchu 300, Taiwan
[3] Natl Univ Kaohsiung, Dept Comp Sci & Informat Engn, Kaohsiung 811, Taiwan
[4] Chung Hua Univ, Dept Informat Management, Hsinchu 300, Taiwan
关键词
Parallel and distributed processing; Cluster computing; Frequent patterns; Association rules; Data mining;
D O I
10.1016/j.eswa.2009.07.074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the exponential growth in worldwide information, companies have to deal with an ever growing amount of digital information. One of the most important challenges for data mining is quickly and correctly finding the relationship among data. The Apriori algorithm has been the most popular technique in finding frequent patterns. However, when applying this method, a database has to be scanned many times to calculate the counts of a huge number of candidate itemsets. Parallel and distributed computing is an effective strategy for accelerating the mining process. In this paper, the Distributed Parallel Apriori (DPA) algorithm is proposed as a solution to this problem. In the proposed method, metadata are stored in the form of Transaction Identifiers (TIDs), such that only a single scan to the database is needed. The approach also takes the factor of itemset counts into consideration, thus generating a balanced workload among processors and reducing processor idle time. Experiments on a PC cluster with 16 computing nodes are also made to show the performance of the proposed approach and compare it with some other parallel mining algorithms. The experimental results show that the proposed approach outperforms the others, especially while the minimum supports are low. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2459 / 2464
页数:6
相关论文
共 50 条
  • [1] A load-balanced parallel join algorithm on hypercube
    Weon, Y
    Lee, S
    Hong, M
    [J]. INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 941 - 950
  • [2] A Load-Balanced Parallel and Distributed Sorting Algorithm Implemented with PGX.D
    Khatami, Zahra
    Hong, Sungpack
    Lee, Jinsoo
    Depner, Siegfried
    Chafi, Hassan
    Ramanujam, J.
    Kaiser, Hartmut
    [J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1317 - 1324
  • [3] A load-balanced algorithm for parallel digital image warping
    Contassot-Vivier, S
    Miguet, S
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 1999, 13 (04) : 445 - 463
  • [4] Load-balanced distributed intra-clustering algorithm
    Kashyap, Pankaj Kumar
    Kumar, Sushil
    [J]. 2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
  • [5] A LOAD-BALANCED PARALLEL SORTING ALGORITHM FOR SHARED-NOTHING ARCHITECTURES
    KUMAR, A
    LEE, TT
    TSOTRAS, VJ
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 1995, 3 (01) : 37 - 68
  • [6] A load-balanced parallel algorithm for 2D image warping
    Jiang, YH
    Chang, ZM
    Yang, XJ
    [J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2004, 3358 : 735 - 745
  • [7] A distributed algorithm for load-balanced routing in multihop wireless sensor networks
    Chatterjee, Punyasha
    Das, Nabanita
    [J]. DISTRIBUTED COMPUTING AND NETWORKING, PROCEEDINGS, 2008, 4904 : 332 - 338
  • [8] Optimal Load-Balanced Scalable Distributed Agreement
    Gelles, Yuval
    Komargodski, Ilan
    [J]. PROCEEDINGS OF THE 56TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2024, 2024, : 411 - 422
  • [9] Load-Balanced Parallel Implementation on GPUs for Multi-Scalar Multiplication Algorithm
    Chen Y.
    Peng C.
    Dai Y.
    Luo M.
    He D.
    [J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024, 2024 (02): : 522 - 544
  • [10] Load-balanced parallel banded-system solvers
    Chung, KL
    Yan, WM
    Wu, JG
    [J]. THEORETICAL COMPUTER SCIENCE, 2002, 289 (01) : 313 - 334