Understanding system design for Big Data workloads

被引:1
|
作者
Hofstee, H. Peter [1 ]
Chen, Guan Cheng [2 ]
Gebara, Fadi H. [1 ]
Hall, Kevin [3 ]
Herring, Jay [4 ]
Jamsek, Damir [1 ]
Li, Jian [1 ]
Li, Yan [2 ]
Shi, Ju Wei [2 ]
Wong, Peter Wai Yee [5 ]
机构
[1] IBM Res Div, Austin Res Lab, Austin, TX 78758 USA
[2] IBM Res Div, China Res Lab, Beijing 100193, Peoples R China
[3] IBM Global Business Serv, Charlotte, NC 28262 USA
[4] IBM Syst & Technol Grp, Poughkeepsie Dev Lab, Poughkeepsie, NY 12601 USA
[5] IBM Res Div, Linux Technol Ctr, Austin, TX 78758 USA
关键词
D O I
10.1147/JRD.2013.2242674
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper explores the design and optimization implications for systems targeted at Big Data workloads. We confirm that these workloads differ from workloads typically run on more traditional transactional and data-warehousing systems in fundamental ways, and, therefore, a system optimized for Big Data can be expected to differ from these other systems. Rather than only studying the performance of representative computational kernels, and focusing on central-processing-unit performance, this paper studies the system as a whole. We identify three major phases in a typical Big Data workload, and we propose that each of these phases should be represented in a Big Data systems benchmark. We implemented our ideas on two distinct IBM POWER7 (R) processor-based systems that target different market sectors, and we analyze their performance on a sort benchmark. In particular, this paper includes an evaluation of POWER7 processor-based systems using MapReduce TeraSort, which is a workload that can be a "stress test" for multiple dimensions of system performance. We combine this work with a broader perspective on Big Data workloads and suggest a direction for a future benchmark definition effort. A number of methods to further improve system performance are proposed.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Analysis of Garbage Collection Patterns to Extend Microbenchmarks for Big Data Workloads
    Sarnayak, Samyak S.
    Ahuja, Aditi
    Kesavarapu, Pranav
    Naik, Aayush
    Kumar, Santhosh
    Kalambur, Subramaniam
    COMPANION OF THE 2022 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE 2022, 2022, : 121 - 128
  • [42] Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads
    Clapp, Russell
    Dimitrov, Martin
    Kumar, Karthik
    Viswanathan, Vish
    Willhalm, Thomas
    2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, : 213 - 224
  • [43] Accelerating data mining workloads: current approaches and future challenges in system architecture design
    Choudhary, Alok N.
    Honbo, Daniel
    Kumar, Prabhat
    Ozisikyilmaz, Berkin
    Misra, Sanchit
    Memik, Gokhan
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (01) : 41 - 54
  • [44] A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability
    Uta, Alexandru
    Obaseki, Harry
    COMPANION OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 113 - 118
  • [45] A Study on the Causes of Garbage Collection in Java']Java for Big Data Workloads
    Sriram, Aiswarya
    Nair, Advithi
    Simon, Alka
    Kalambur, Subramaniam
    Sitaram, Dinkar
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5831 - 5833
  • [46] Main memory controller with multiple media technologies for big data workloads
    Miguel A. Avargues
    Manel Lurbe
    Salvador Petit
    Maria E. Gomez
    Rui Yang
    Xiaoping Zhu
    Guanhao Wang
    Julio Sahuquillo
    Journal of Big Data, 10
  • [47] BIG data - BIG gains? Understanding the link between big data analytics and innovation
    Niebel, Thomas
    Rasel, Fabienne
    Viete, Steffen
    ECONOMICS OF INNOVATION AND NEW TECHNOLOGY, 2019, 28 (03) : 296 - 316
  • [49] DESIGN ANALYTICS: CAPTURING, UNDERSTANDING, AND MEETING CUSTOMER NEEDS USING BIG DATA
    Van Horn, David
    Olewnik, Andrew
    Lewis, Kemper
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, VOL 7, 2012, : 863 - +
  • [50] Understanding big consumer opinion data for market-driven product design
    Jin, Jian
    Liu, Ying
    Ji, Ping
    Liu, Hongguang
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2016, 54 (10) : 3019 - 3041