A Utility-Based Distributed Pattern Mining Algorithm With Reduced Shuffle Overhead

被引:3
|
作者
Kumar, Sunil [1 ]
Mohbey, Krishna Kumar [1 ]
机构
[1] Cent Univ Rajasthan, Dept Comp Sci, Ajmer 305817, Rajasthan, India
关键词
Communication cost; distributed computing; high utility pattern mining; scalability; spark; MAPREDUCE; ITEMSETS;
D O I
10.1109/TPDS.2022.3221210
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the arrival of the current digital era and the advancement of information transmission technologies, there has been an unprecedented rise in data. Efficient extraction of useful information from the volumes of data has garnered growing interest from academics and the industry. Data mining research focuses on finding utility patterns in large datasets. But the inherent complications like frequent scans, creation of substantial candidate sets, etc. plague the mining process for large datasets. Distributive architecture-based approaches also prove inefficacious due to high communication overhead over iterations. High communication cost over data exchange both locally and remotely further aggravates the situation. We propose a Communication Cost Effective Utility-based Pattern Mining (CEUPM) algorithm based on the Spark framework to address this issue. Spark accelerates iterative scanning by storing scanned datasets in a memory abstraction called resilient distributed datasets (RDD). RDD operations need a redistribution of data among cluster nodes during processing. To minimize the communication cost incurred during the shuffle process, we adopt a search space division strategy based on data parallelism for a fair and effective task allocation across cluster nodes. Communication overhead is incurred during this redistribution or shuffle process while minimizing costs. Experimental results in four real datasets demonstrate that CEUPM considerably reduces shuffling overhead and outperforms other existing methods in terms of memory usage, communication cost, execution time, and scalability.
引用
收藏
页码:416 / 428
页数:13
相关论文
共 50 条
  • [1] Correlated utility-based pattern mining
    Gan, Wensheng
    Lin, Jerry Chun-Wei
    Chao, Han-Chieh
    Fujita, Hamido
    Yu, Philip S.
    [J]. INFORMATION SCIENCES, 2019, 504 : 470 - 486
  • [2] CoUPM: Correlated Utility-based Pattern Mining
    Gan, Wensheng
    Chun-Wei, Jerry
    Chao, Han-Chieh
    Hong, Tzung-Pei
    Yu, Philip S.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2607 - 2616
  • [3] A Spark-based Apriori algorithm with reduced shuffle overhead
    Raj, Shashi
    Ramesh, Dharavath
    Sethi, Krishan Kumar
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (01): : 133 - 151
  • [4] A Spark-based Apriori algorithm with reduced shuffle overhead
    Shashi Raj
    Dharavath Ramesh
    Krishan Kumar Sethi
    [J]. The Journal of Supercomputing, 2021, 77 : 133 - 151
  • [5] Utility Pattern Mining Algorithm Based on Improved Utility Pattern Tree
    Xing, Shuning
    Liu, Fangai
    Wang, Jiwei
    Pang, Lin
    Xu, Zhenguo
    [J]. 2015 8TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2015, : 258 - 261
  • [6] UGMINE: utility-based graph mining
    Alam, Md. Tanvir
    Roy, Amit
    Ahmed, Chowdhury Farhan
    Islam, Md. Ashraful
    Leung, Carson K.
    [J]. APPLIED INTELLIGENCE, 2023, 53 (01) : 49 - 68
  • [7] UGMINE: utility-based graph mining
    Md. Tanvir Alam
    Amit Roy
    Chowdhury Farhan Ahmed
    Md. Ashraful Islam
    Carson K. Leung
    [J]. Applied Intelligence, 2023, 53 : 49 - 68
  • [8] A utility-based distributed maximum lifetime routing algorithm for wireless networks
    Xue, Y
    Cui, Y
    Nahrstedt, K
    [J]. 2005 2ND INTERNATIONAL CONFERENCE ON QUALITY OF SERVICE IN HETEROGENEOUS WIRED/WIRELESS NETWORKS (QSHINE), 2005, : 145 - 154
  • [9] Coefficients Optimization in Femtocell Utility Function for Distributed Utility-based SINR Adaption Algorithm
    Sun, Lili
    Zhang, Haixia
    Jiang, Dongmei
    [J]. 2014 SIXTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN 2014), 2014, : 233 - 237
  • [10] A utility-based distributed maximum lifetime routing algorithm for wireless networks
    Cui, Yi
    Xue, Yuan
    Nahrstedt, Mara
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2006, 55 (03) : 797 - 805