Understanding system design for Big Data workloads

被引:1
|
作者
Hofstee, H. Peter [1 ]
Chen, Guan Cheng [2 ]
Gebara, Fadi H. [1 ]
Hall, Kevin [3 ]
Herring, Jay [4 ]
Jamsek, Damir [1 ]
Li, Jian [1 ]
Li, Yan [2 ]
Shi, Ju Wei [2 ]
Wong, Peter Wai Yee [5 ]
机构
[1] IBM Res Div, Austin Res Lab, Austin, TX 78758 USA
[2] IBM Res Div, China Res Lab, Beijing 100193, Peoples R China
[3] IBM Global Business Serv, Charlotte, NC 28262 USA
[4] IBM Syst & Technol Grp, Poughkeepsie Dev Lab, Poughkeepsie, NY 12601 USA
[5] IBM Res Div, Linux Technol Ctr, Austin, TX 78758 USA
关键词
D O I
10.1147/JRD.2013.2242674
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper explores the design and optimization implications for systems targeted at Big Data workloads. We confirm that these workloads differ from workloads typically run on more traditional transactional and data-warehousing systems in fundamental ways, and, therefore, a system optimized for Big Data can be expected to differ from these other systems. Rather than only studying the performance of representative computational kernels, and focusing on central-processing-unit performance, this paper studies the system as a whole. We identify three major phases in a typical Big Data workload, and we propose that each of these phases should be represented in a Big Data systems benchmark. We implemented our ideas on two distinct IBM POWER7 (R) processor-based systems that target different market sectors, and we analyze their performance on a sort benchmark. In particular, this paper includes an evaluation of POWER7 processor-based systems using MapReduce TeraSort, which is a workload that can be a "stress test" for multiple dimensions of system performance. We combine this work with a broader perspective on Big Data workloads and suggest a direction for a future benchmark definition effort. A number of methods to further improve system performance are proposed.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Understanding Big Data Analytics Workloads on Modern Processors
    Jia, Zhen
    Zhan, Jianfeng
    Wang, Lei
    Luo, Chunjie
    Gao, Wanling
    Jin, Yi
    Han, Rui
    Zhang, Lixin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1797 - 1810
  • [2] Memory System Characterization of Big Data Workloads
    Dimitrov, Martin
    Kumar, Karthik
    Lu, Patrick
    Viswanathan, Vish
    Willhalm, Thomas
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [3] Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads
    Gao, Wanling
    Zhan, Jianfeng
    Wang, Lei
    Luo, Chunjie
    Zheng, Daoyi
    Tang, Fei
    Xie, Biwei
    Zheng, Chen
    Wen, Xu
    He, Xiwen
    Ye, Hainan
    Ren, Rui
    27TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2018), 2018,
  • [4] Performance Characterization and Acceleration of Big Data Workloads on OpenPOWER System
    Lu, Xiaoyi
    Shi, Haiyang
    Shankar, Dipti
    Panda, Dhabaleswar K.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 213 - 222
  • [5] Characterizing and Subsetting Big Data Workloads
    Jia, Zhen
    Zhan, Jianfeng
    Wang, Lei
    Han, Rui
    McKee, Sally A.
    Yang, Qiang
    Luo, Chunjie
    Li, Jingwei
    2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2014, : 191 - 201
  • [6] CloudFinder: A System for Processing Big Data Workloads on Volunteered Federated Clouds
    Rezgui, Abdelmounaam
    Davis, Nickolas
    Malik, Zaki
    Medjahed, Brahim
    Soliman, Hamdy S.
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (02) : 347 - 358
  • [7] Automotive Big Data: Applications, Workloads and Infrastructures
    Luckow, Andre
    Kennedy, Ken
    Manhardt, Fabian
    Djerekarov, Emil
    Vorster, Bennie
    Apon, Amy
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1201 - 1210
  • [8] Characterization and Architectural Implications of Big Data Workloads
    Wang, Lei
    Ren, Rui
    Zhan, Jianfeng
    Jia, Zhen
    2016 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE ISPASS 2016, 2016, : 145 - 146
  • [9] TideWatch: Fingerprinting the Cyclicality of Big Data Workloads
    Williams, Dan
    Zheng, Shuai
    Zhang, Xiangliang
    Jamjoom, Hani
    2014 PROCEEDINGS IEEE INFOCOM, 2014, : 2031 - 2039
  • [10] Researchers' Big Data Crisis; Understanding Design and Functionality
    Stonebraker, Michael
    Hong, Jason
    COMMUNICATIONS OF THE ACM, 2012, 55 (02) : 10 - 11