Understanding system design for Big Data workloads

被引:1
|
作者
Hofstee, H. Peter [1 ]
Chen, Guan Cheng [2 ]
Gebara, Fadi H. [1 ]
Hall, Kevin [3 ]
Herring, Jay [4 ]
Jamsek, Damir [1 ]
Li, Jian [1 ]
Li, Yan [2 ]
Shi, Ju Wei [2 ]
Wong, Peter Wai Yee [5 ]
机构
[1] IBM Res Div, Austin Res Lab, Austin, TX 78758 USA
[2] IBM Res Div, China Res Lab, Beijing 100193, Peoples R China
[3] IBM Global Business Serv, Charlotte, NC 28262 USA
[4] IBM Syst & Technol Grp, Poughkeepsie Dev Lab, Poughkeepsie, NY 12601 USA
[5] IBM Res Div, Linux Technol Ctr, Austin, TX 78758 USA
关键词
D O I
10.1147/JRD.2013.2242674
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper explores the design and optimization implications for systems targeted at Big Data workloads. We confirm that these workloads differ from workloads typically run on more traditional transactional and data-warehousing systems in fundamental ways, and, therefore, a system optimized for Big Data can be expected to differ from these other systems. Rather than only studying the performance of representative computational kernels, and focusing on central-processing-unit performance, this paper studies the system as a whole. We identify three major phases in a typical Big Data workload, and we propose that each of these phases should be represented in a Big Data systems benchmark. We implemented our ideas on two distinct IBM POWER7 (R) processor-based systems that target different market sectors, and we analyze their performance on a sort benchmark. In particular, this paper includes an evaluation of POWER7 processor-based systems using MapReduce TeraSort, which is a workload that can be a "stress test" for multiple dimensions of system performance. We combine this work with a broader perspective on Big Data workloads and suggest a direction for a future benchmark definition effort. A number of methods to further improve system performance are proposed.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads
    Kalyanasundaram, Jayanth
    Simmhan, Yogesh
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 203 - 212
  • [22] BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework
    Zhu, Yuqing
    Zhan, Jianfeng
    Weng, Chuliang
    Nambiar, Raghunath
    Zhang, Jinchao
    Chen, Xingzhen
    Wang, Lei
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT II, 2014, 8422 : 483 - 492
  • [23] MREv: an Automatic MapReduce Evaluation Tool for Big Data Workloads
    Veiga, Jorge
    Exposito, Roberto R.
    Taboada, Guillermo L.
    Tourino, Juan
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 80 - 89
  • [24] Steering Query Optimizers: A Practical Take on Big Data Workloads
    Negi, Parimarjan
    Interlandi, Matteo
    Marcus, Ryan
    Alizadeh, Mohammad
    Kraska, Tim
    Friedman, Marc
    Jindal, Alekh
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2557 - 2569
  • [25] Autonomic Workload Change Classification and Prediction for Big Data Workloads
    Genkin, Mikhail
    Dehne, Frank
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2835 - 2844
  • [26] Holistic Disaster Recovery Approach For Big Data NoSQL Workloads
    Abadi, Aharon
    Haib, Ashraf
    Melamed, Roie
    Nassar, Alaa
    Shribman, Aidan
    Yasin, Hisham
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2075 - 2080
  • [27] MREv: An automatic mapreduce evaluation tool for big data workloads
    20153401191864
    (1) Computer Architecture Group, University of A Coruña, Spain, (Elsevier B.V., Netherlands):
  • [28] Evaluation of Linux I/O Schedulers for Big Data Workloads
    Rezgui, Abdelmounaam
    White, Matthew
    Rezgui, Sami
    Malik, Zaki
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 227 - 234
  • [29] Data Motif-based Proxy Benchmarks for Big Data and AI Workloads
    Gao, Wanling
    Zhan, Jianfeng
    Wang, Lei
    Luo, Chunjie
    Jia, Zhen
    Zheng, Daoyi
    Zheng, Chen
    He, Xiwen
    Ye, Hainan
    Wang, Haibin
    Ren, Rui
    2018 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2018, : 48 - 58
  • [30] Big Data on Low Power Cores Are Low Power Embedded Processors a good fit for the Big Data Workloads?
    Malik, Maria
    Homayoun, Houman
    2015 33RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2015, : 379 - 382