On using MapReduce to scale algorithms for Big Data analytics: a case study

被引:4
|
作者
Kijsanayothin, Phongphun [1 ]
Chalumporn, Gantaphon [2 ]
Hewett, Rattikorn [2 ]
机构
[1] Naresuan Univ, Dept Elect & Comp Engn, NU, Phitsanulok, Thailand
[2] Texas Tech Univ, Dept Comp Sci, TTU, Lubbock, TX 79409 USA
关键词
Big Data analytics algorithms; Association rules mining; MapReduce; Parallel computing; A-PRIORI ALGORITHM; PARALLEL; MODEL;
D O I
10.1186/s40537-019-0269-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
IntroductionMany data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to "Big algorithms" for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution.Case descriptionThis paper investigates a case study of a scaling problem of "Big algorithms" for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model.Discussion and evaluationFormal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000.ConclusionsThe results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based "Big algorithms".
引用
收藏
页数:20
相关论文
共 50 条
  • [1] On using MapReduce to scale algorithms for Big Data analytics: a case study
    Phongphun Kijsanayothin
    Gantaphon Chalumporn
    Rattikorn Hewett
    [J]. Journal of Big Data, 6
  • [2] Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce
    Ramirez-Gallego, Sergio
    Fernandez, Alberto
    Garcia, Salvador
    Chen, Min
    Herrera, Francisco
    [J]. INFORMATION FUSION, 2018, 42 : 51 - 61
  • [3] Big Data Analytics based on PANFIS MapReduce
    Za'in, Choiru
    Pratama, Mahardhika
    Lughofer, Edwin
    Ferdaus, Meftahul
    Cai, Qing
    Prasad, Mukesh
    [J]. INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
  • [4] An Approach in Big Data Analytics to Improve the Velocity of Unstructured Data Using MapReduce
    Sundarakumar, M. R.
    Mahadevan, G.
    Somula, Ramasubbareddy
    Sennan, Sankar
    Rawal, Bharat S.
    [J]. INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2021, 10 (04)
  • [5] MapReduce Algorithms for Big Data Analysis
    Shim, Kyuseok
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2016 - 2017
  • [6] Enabling Big Data Analytics in the Hybrid Cloud using Iterative MapReduce
    Clemente-Castello, Francisco J.
    Nicolae, Bogdan
    Katrinis, Kostas
    Rafique, M. Mustafa
    Mayo, Rafael
    Carlos Fernandez, Juan
    Loreti, Daniela
    [J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 290 - 299
  • [7] MapReduce Algorithms for Big Data Analysis
    Shim, Kyuseok
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : XV - XV
  • [8] Hierarchical attribute reduction algorithms for big data using MapReduce
    Qian, Jin
    Lv, Ping
    Yue, Xiaodong
    Liu, Caihui
    Jing, Zhengjun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 73 : 18 - 31
  • [9] Parallel knowledge acquisition algorithms for big data using MapReduce
    Jin Qian
    Min Xia
    Xiaodong Yue
    [J]. International Journal of Machine Learning and Cybernetics, 2018, 9 : 1007 - 1021
  • [10] Parallel knowledge acquisition algorithms for big data using MapReduce
    Qian, Jin
    Xia, Min
    Yue, Xiaodong
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (06) : 1007 - 1021