On using MapReduce to scale algorithms for Big Data analytics: a case study

被引:4
|
作者
Kijsanayothin, Phongphun [1 ]
Chalumporn, Gantaphon [2 ]
Hewett, Rattikorn [2 ]
机构
[1] Naresuan Univ, Dept Elect & Comp Engn, NU, Phitsanulok, Thailand
[2] Texas Tech Univ, Dept Comp Sci, TTU, Lubbock, TX 79409 USA
关键词
Big Data analytics algorithms; Association rules mining; MapReduce; Parallel computing; A-PRIORI ALGORITHM; PARALLEL; MODEL;
D O I
10.1186/s40537-019-0269-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
IntroductionMany data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to "Big algorithms" for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution.Case descriptionThis paper investigates a case study of a scaling problem of "Big algorithms" for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model.Discussion and evaluationFormal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000.ConclusionsThe results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based "Big algorithms".
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Big Data Analytics on High Velocity Streams: A Case Study
    Chardonnens, Thibaud
    Cudre-Mauroux, Philippe
    Grund, Martin
    Perroud, Benoit
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [42] Clustering on Big Data Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Khan, Shahbaz
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 789 - 795
  • [43] LARGE SCALE OPTIMIZATION TO MINIMIZE NETWORK TRAFFIC USING MAPREDUCE IN BIG DATA APPLICATIONS
    Neelakandan, S.
    Divyabharathi, S.
    Rahini, S.
    Vijayalakshmi, G.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATION OF POWER, ENERGY INFORMATION AND COMMUNICATION (ICCPEIC), 2016, : 193 - 199
  • [44] Big Data Analytics in Telecommunication using state-of-the-art Big Data Framework in a Distributed Computing Environment: A Case Study
    Ved, Mohit
    Rizwanahmed, B.
    [J]. 2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, : 411 - 416
  • [45] Complete Storm Identification Algorithms from Big Raw Rainfall Data Using MapReduce Framework
    Jitkajornwanich, Kulsawasd
    Gupta, Upa
    Shanmuganathan, Sakthi Kumaran
    Elmasri, Ramez
    Fegaras, Leonidas
    McEnery, John
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [46] Time Series Data Mining: A Case Study With Big Data Analytics Approach
    Wang, Fang
    Li, Menggang
    Mei, Yiduo
    Li, Wenrui
    [J]. IEEE ACCESS, 2020, 8 : 14322 - 14328
  • [47] Data analytics and knowledge discovery on big data: Algorithms, architectures, and applications
    Wrembel, Robert
    Gamper, Johann
    [J]. DATA & KNOWLEDGE ENGINEERING, 2024, 150
  • [48] Data Lake: A Case of Study of a Big Data Analytics Architecture for Public Procurements
    Sosa, David
    Paciello, Julio
    [J]. 2021 EIGHT INTERNATIONAL CONFERENCE ON EDEMOCRACY & EGOVERNMENT (ICEDEG), 2021, : 194 - 198
  • [49] Performance Analysis of Machine Learning Algorithms on Diabetes Dataset using Big Data Analytics
    Kumar, P. Suresh
    Pranavi, S.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INFOCOM TECHNOLOGIES AND UNMANNED SYSTEMS (TRENDS AND FUTURE DIRECTIONS) (ICTUS), 2017, : 508 - 513
  • [50] Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark
    Hafez, Manar Mohamed
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    Hegazy, Abd El Ftah Abdel Ghfar
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 692 - 704