MREv: an Automatic MapReduce Evaluation Tool for Big Data Workloads

被引:6
|
作者
Veiga, Jorge [1 ]
Exposito, Roberto R. [1 ]
Taboada, Guillermo L. [1 ]
Tourino, Juan [1 ]
机构
[1] Univ A Coruna, Comp Architecture Grp, La Coruna, Spain
关键词
High Performance Computing (HPC); Big Data; MapReduce; Performance Evaluation; Resource Efficiency; InfiniBand;
D O I
10.1016/j.procs.2015.05.202
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The popularity of Big Data computing models like MapReduce has caused the emergence of many frameworks oriented to High Performance Computing (HPC) systems. The suitability of each one to a particular use case depends on its design and implementation, the underlying system resources and the type of application to be run. Therefore, the appropriate selection of one of these frameworks generally involves the execution of multiple experiments in order to assess their performance, scalability and resource efficiency. This work studies the main issues of this evaluation, proposing a new MapReduce Evaluator (MREv) tool which unifies the configuration of the frameworks, eases the task of collecting results and generates resource utilization statistics. Moreover, a practical use case is described, including examples of the experimental results provided by this tool. MREv is available to download at http://mrev.des.udc.es.
引用
收藏
页码:80 / 89
页数:10
相关论文
共 50 条
  • [1] Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads
    Satish, Rama K., V
    Kavya, N. P.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 49 - 54
  • [2] Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads
    Chen, Yanpei
    Alspaugh, Sara
    Katz, Randy
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 1802 - 1813
  • [3] Performance Evaluation of Big Data Frameworks: MapReduce and Spark
    Singh, Jaspreet
    Panda, S. N.
    Kaushal, Rajesh
    [J]. INTELLIGENT COMMUNICATION, CONTROL AND DEVICES, ICICCD 2017, 2018, 624 : 1611 - 1619
  • [4] Evaluation of Linux I/O Schedulers for Big Data Workloads
    Rezgui, Abdelmounaam
    White, Matthew
    Rezgui, Sami
    Malik, Zaki
    [J]. 2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 227 - 234
  • [5] MapReduce Clustering for Big Data
    Ghattas, Badih
    Pinto, Antoine
    Diao, Sambou
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5116 - 5124
  • [6] Challenges for MapReduce in Big Data
    Grolinger, Katarina
    Hayes, Michael
    Higashino, Wilson A.
    L'Heureux, Alexandra
    Allison, David S.
    Capretz, Miriam A. M.
    [J]. 2014 IEEE WORLD CONGRESS ON SERVICES (SERVICES), 2014, : 182 - 189
  • [7] Characterizing and Subsetting Big Data Workloads
    Jia, Zhen
    Zhan, Jianfeng
    Wang, Lei
    Han, Rui
    McKee, Sally A.
    Yang, Qiang
    Luo, Chunjie
    Li, Jingwei
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2014, : 191 - 201
  • [8] A Crowdsourcing Worker Quality Evaluation Algorithm on MapReduce for Big Data Applications
    Dang, Depeng
    Liu, Ying
    Zhang, Xiaoran
    Huang, Shihang
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (07) : 1879 - 1888
  • [9] MapReduce: Simplified Data Analysis of Big Data
    Maitrey, Seema
    Jha, C. K.
    [J]. 3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 : 563 - 571
  • [10] A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
    Phuong Nguyen
    Simon, Tyler
    Halem, Milton
    Chapman, David
    Le, Quang
    [J]. 2012 IEEE/ACM FIFTH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2012), 2012, : 161 - 167