Understanding system design for Big Data workloads

被引:1
|
作者
Hofstee, H. Peter [1 ]
Chen, Guan Cheng [2 ]
Gebara, Fadi H. [1 ]
Hall, Kevin [3 ]
Herring, Jay [4 ]
Jamsek, Damir [1 ]
Li, Jian [1 ]
Li, Yan [2 ]
Shi, Ju Wei [2 ]
Wong, Peter Wai Yee [5 ]
机构
[1] IBM Res Div, Austin Res Lab, Austin, TX 78758 USA
[2] IBM Res Div, China Res Lab, Beijing 100193, Peoples R China
[3] IBM Global Business Serv, Charlotte, NC 28262 USA
[4] IBM Syst & Technol Grp, Poughkeepsie Dev Lab, Poughkeepsie, NY 12601 USA
[5] IBM Res Div, Linux Technol Ctr, Austin, TX 78758 USA
关键词
D O I
10.1147/JRD.2013.2242674
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper explores the design and optimization implications for systems targeted at Big Data workloads. We confirm that these workloads differ from workloads typically run on more traditional transactional and data-warehousing systems in fundamental ways, and, therefore, a system optimized for Big Data can be expected to differ from these other systems. Rather than only studying the performance of representative computational kernels, and focusing on central-processing-unit performance, this paper studies the system as a whole. We identify three major phases in a typical Big Data workload, and we propose that each of these phases should be represented in a Big Data systems benchmark. We implemented our ideas on two distinct IBM POWER7 (R) processor-based systems that target different market sectors, and we analyze their performance on a sort benchmark. In particular, this paper includes an evaluation of POWER7 processor-based systems using MapReduce TeraSort, which is a workload that can be a "stress test" for multiple dimensions of system performance. We combine this work with a broader perspective on Big Data workloads and suggest a direction for a future benchmark definition effort. A number of methods to further improve system performance are proposed.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Understanding the value of (big) data
    Pantelis, Koutroumpis
    Aija, Leiponen
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [32] A Big Data System Design to Predict the Vehicle Slip
    Jeon, Joohyoung
    Lee, Woosik
    Cho, Hyo Joo
    Lee, Hongchul
    2015 15TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2015, : 592 - 596
  • [33] Model oriented system design on big-data
    Kushiro, Noriyuki
    Matsuda, Shodai
    Takahara, Kunio
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 : 961 - 968
  • [34] Understanding big data in librarianship
    Zhan, Ming
    Widen, Gunilla
    JOURNAL OF LIBRARIANSHIP AND INFORMATION SCIENCE, 2019, 51 (02) : 561 - 576
  • [35] Big data for secure healthcare system: a conceptual design
    Sarkar, Bikash Kanti
    COMPLEX & INTELLIGENT SYSTEMS, 2017, 3 (02) : 133 - 151
  • [36] Big data for secure healthcare system: a conceptual design
    Bikash Kanti Sarkar
    Complex & Intelligent Systems, 2017, 3 : 133 - 151
  • [37] The Big Data's Structure Design of the Management System
    Yu, Jian
    Yan, Yu
    EDUCATION AND MANAGEMENT INNOVATION, 2017, : 7 - 12
  • [38] Main memory controller with multiple media technologies for big data workloads
    Avargues, Miguel A.
    Lurbe, Manel
    Petit, Salvador
    Gomez, Maria E.
    Yang, Rui
    Zhu, Xiaoping
    Wang, Guanhao
    Sahuquillo, Julio
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [39] Analysis of Garbage Collection Patterns to Extend Microbenchmarks for Big Data Workloads
    Sarnayak, Samyak S.
    Ahuja, Aditi
    Kesavarapu, Pranav
    Naik, Aayush
    Kumar V, Santhosh
    Kalambur, Subramaniam
    ICPE 2022 - Companion of the 2022 ACM/SPEC International Conference on Performance Engineering, 2022, : 121 - 128
  • [40] Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads
    Satish, Rama K., V
    Kavya, N. P.
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 49 - 54