An Analytical Approach to Evaluation of SSD Effects under MapReduce Workloads

被引:10
|
作者
Ahn, Sungyong
Park, Sangkyu
机构
[1] DS Software R and D Center, Samsung Electronics Co. Ltd
关键词
MapReduce; Hadoop; performance modeling; SSDs; cost-per-performance;
D O I
10.5573/JSTS.2015.15.5.511
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As the cost-per-byte of SSDs dramatically decreases, the introduction of SSDs to Hadoop becomes an attractive choice for high performance data processing. In this paper the cost-per-performance of SSD-based Hadoop cluster (SSD-Hadoop) and HDD-based Hadoop cluster (HDD-Hadoop) are evaluated. For this, we propose a MapReduce performance model using queuing network to simulate the execution time of MapReduce job with varying cluster size. To achieve an accurate model, the execution time distribution of MapReduce job is carefully profiled. The developed model can precisely predict the execution time of MapReduce jobs with less than 7% difference for most cases. It is also found that SSD-Hadoop is 20% more cost efficient than HDD-Hadoop because SSD-Hadoop needs a smaller number of nodes than HDD-Hadoop to achieve a comparable performance, according to the results of simulation with varying the number of cluster nodes.
引用
收藏
页码:511 / 518
页数:8
相关论文
共 50 条
  • [1] Analytical Performance Models for MapReduce Workloads
    Vianna, Emanuel
    Comarela, Giovanni
    Pontes, Tatiana
    Almeida, Jussara
    Almeida, Virgilio
    Wilkinson, Kevin
    Kuno, Harumi
    Dayal, Umeshwar
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (04) : 495 - 525
  • [2] Analytical Performance Models for MapReduce Workloads
    Emanuel Vianna
    Giovanni Comarela
    Tatiana Pontes
    Jussara Almeida
    Virgílio Almeida
    Kevin Wilkinson
    Harumi Kuno
    Umeshwar Dayal
    [J]. International Journal of Parallel Programming, 2013, 41 : 495 - 525
  • [3] HMM Optimized Modeling of SSD Storage for I/O MapReduce Workloads
    Alsayoud, Fatimah
    Miri, Ali
    [J]. 2019 IEEE 10TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), 2019, : 177 - 183
  • [4] SSD: Cache Or Tier An Evaluation of SSD Cost and Efficiency Using MapReduce
    Alsayoud, Fatimah
    Miri, Ali
    [J]. 2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
  • [5] HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
    Abouzeid, Azza
    Bajda-Pawlikowski, Kamil
    Abadi, Daniel
    Silberschatz, Avi
    Rasin, Alexander
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01):
  • [6] Performance comparison under failures of MPI and MapReduce: An analytical approach
    Jin, Hui
    Sun, Xian-He
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (07): : 1808 - 1815
  • [7] Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads
    Satish, Rama K., V
    Kavya, N. P.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 49 - 54
  • [8] MREv: an Automatic MapReduce Evaluation Tool for Big Data Workloads
    Veiga, Jorge
    Exposito, Roberto R.
    Taboada, Guillermo L.
    Tourino, Juan
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 80 - 89
  • [9] Accelerating MapReduce on Commodity Clusters: An SSD-Empowered Approach
    Wang, Bo
    Jiang, Jinlei
    Wu, Yongwei
    Yang, Guangwen
    Li, Keqin
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (03) : 396 - 407
  • [10] Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads
    Chen, Yanpei
    Alspaugh, Sara
    Katz, Randy
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 1802 - 1813