A fast and resource efficient mining algorithm for discovering frequent patterns in distributed computing environments

被引:16
|
作者
Lin, Kawuu W. [1 ]
Chung, Sheng-Hao [1 ]
机构
[1] Natl Kaohsiung Univ Appl Sci, Dept Comp Sci & Informat Engn, Kaohsiung 807, Taiwan
关键词
Data mining; Frequent pattern mining; Distributed mining; Parallel mining; DATABASES;
D O I
10.1016/j.future.2015.05.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The advancement of electronic technology enables us to collect logs from various devices. Such logs require detailed analysis in order to be broadly useful. Data mining is a technique that has been widely used to extract hidden information from such data. Data mining is mainly composed of association rules mining, sequent pattern mining, classification and clustering. Association rules mining has attracted significant attention and been successfully applied to various fields. Although the past studies can effectively discover frequent patterns to deduce association rules, execution efficiency is still a critical problem. To speed up execution, many methods using parallel and distributed computing technology have been proposed in recent years. Most of the past studies focused on parallelizing the workload in a high end machine or in distributed computing environments like grid or cloud computing systems; however, very few of them discuss how to efficiently determine the appropriate number of computing nodes, considering execution efficiency and load balancing. An intuition is that execution speed is proportional to the number of computing nodes that is, more the number of computing nodes, faster is the execution speed. However, this is incorrect for such algorithms because of the inherently algorithmic design. Allocating too many computing nodes can lead to high execution time. In addition to the execution inefficiency, inappropriate resource allocation is a waste of computing power and network bandwidth. At the same time, load cannot be effectively distributed if there are too few nodes allocated. In this paper, we propose a fast, load balancing and resource efficient algorithm named FLR-Mining for discovering frequent patterns in distributed computing systems. FLR-Mining is capable of determining the appropriate number of computing nodes automatically and achieving better load balancing as compared with existing methods. Through empirical evaluation, FLR-Mining is shown to deliver excellent performance in terms of execution efficiency and load balancing. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 50 条
  • [1] Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments
    Lin, Wei-Tee
    Chu, Chih-Ping
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2015, 30 (05) : 380 - 392
  • [2] A Distributed Algorithm for Fast Mining Frequent Patterns in Limited and Varying Network Bandwidth Environments
    Lin, Chun-Cheng
    Li, Wei-Ching
    Chen, Ju-Chin
    Chung, Wen-Yu
    Chung, Sheng-Hao
    Lin, Kawuu W.
    APPLIED SCIENCES-BASEL, 2019, 9 (09):
  • [3] A fast and distributed algorithm for mining frequent patterns in congested networks
    Kawuu W. Lin
    Sheng-Hao Chung
    Chun-Cheng Lin
    Computing, 2016, 98 : 235 - 256
  • [4] A fast and distributed algorithm for mining frequent patterns in congested networks
    Lin, Kawuu W.
    Chung, Sheng-Hao
    Lin, Chun-Cheng
    COMPUTING, 2016, 98 (03) : 235 - 256
  • [5] An Efficient and Fast Algorithm for Mining Frequent Patterns on Multiple Biosequences
    Liu, Wei
    Chen, Ling
    COMPUTER AND COMPUTING TECHNOLOGIES IN AGRICULTURE IV, PT 1, 2011, 344 : 178 - 194
  • [6] A Fast Parallel Algorithm for Discovering Frequent Patterns
    Lin, Kawuu W.
    Luo, Yu-Chin
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 398 - 403
  • [7] A fast and low idle time method for mining frequent patterns in distributed and many-task computing environments
    Chun-Cheng Lin
    Sheng-Hao Chung
    Ju-Chin Chen
    Yuan-Tse Yu
    Kawuu W. Lin
    Distributed and Parallel Databases, 2018, 36 : 613 - 641
  • [8] A fast and low idle time method for mining frequent patterns in distributed and many-task computing environments
    Lin, Chun-Cheng
    Chung, Sheng-Hao
    Chen, Ju-Chin
    Yu, Yuan-Tse
    Lin, Kawuu W.
    DISTRIBUTED AND PARALLEL DATABASES, 2018, 36 (04) : 613 - 641
  • [9] A fast algorithm for mining frequent patterns
    Ruan, YL
    Zhang, JJ
    Li, QH
    Yang, SD
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1683 - 1686
  • [10] A Efficient Algorithm for Discovering all Frequent Patterns
    Chen, Fuzan
    Li, Minqiang
    Kou, Jisong
    PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL II, 2009, : 351 - 355