Using Cycle Stacks to Understand Scaling Bottlenecks in Multi-Threaded Workloads

被引:0
|
作者
Heirman, Wim [1 ,3 ]
Carlson, Trevor E. [1 ,3 ]
Che, Shuai [2 ]
Skadron, Kevin [2 ]
Eeckhout, Lieven [1 ]
机构
[1] Univ Ghent, Dept Elect & Informat Syst, Ghent, Belgium
[2] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA
[3] Intel Exasci Lab, Ghent, Belgium
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a methodology for analyzing parallel performance by building cycle stacks. A cycle stack quantifies where the cycles have gone, and provides hints towards optimization opportunities. We make the case that this is particularly interesting for analyzing parallel performance: understanding how cycle components scale with increasing core counts and/or input data set sizes leads to insight with respect to scaling bottlenecks due to synchronization, load imbalance, poor memory performance, etc. We present several case studies illustrating the use of cycle stacks. As a subsequent step, we further extend the methodology to analyze sets of parallel workloads using statistical data analysis, and perform a workload characterization to understand behavioral differences across benchmark suites. We analyze the SPLASH-2, PARSEC and Rodinia benchmark suites and conclude that the three benchmark suites cover similar areas in the workload space. However, scaling behavior of these benchmarks towards larger input sets and/or higher core counts is highly dependent on the benchmark, the way in which the inputs have been scaled, and on the machine configuration.
引用
收藏
页码:38 / 49
页数:12
相关论文
共 50 条
  • [1] Variability in architectural simulations of multi-threaded workloads
    Alameldeen, AR
    Wood, DA
    [J]. NINTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2003, : 7 - 18
  • [2] Classifying Performance Bottlenecks in Multi-Threaded Applications
    Dutta, Sourav
    Manakkadu, Sheheeda
    Kagaris, Dimitri
    [J]. 2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 341 - 345
  • [3] Reproducible Simulation of Multi-Threaded Workloads for Architecture Design Exploration
    Pereira, Cristiano
    Patil, Harish
    Calder, Brad
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2008, : 161 - +
  • [4] Bottle Graphs: Visualizing Scalability Bottlenecks in Multi-Threaded Applications
    Du Bois, Kristof
    Sartor, Jennifer B.
    Eyerman, Stijn
    Eeckhout, Lieven
    [J]. ACM SIGPLAN NOTICES, 2013, 48 (10) : 355 - 371
  • [5] Pac-Sim: Simulation of Multi-threaded Workloads using Intelligent, Live Sampling
    Liu, Changxi
    Sabu, Alen
    Chaudhari, Akanksha
    Kang, Qingxuan
    Carlson, Trevor E.
    [J]. ACM Transactions on Architecture and Code Optimization, 2024, 21 (04)
  • [6] SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads
    Georgakoudis, Giorgis
    Vandierendonck, Hans
    Thoman, Peter
    De Supinski, Bronis R.
    Fahringer, Thomas
    Nikolopoulos, Dimitrios S.
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (04)
  • [7] Multi-threaded Output in CMS using ROOT
    Riley, Daniel
    Jones, Christopher
    [J]. 23RD INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2018), 2019, 214
  • [8] Multi-Threaded Circuit Simulation using OpenMP
    Zwolinski, Mark
    [J]. 2010 FIRST IEEE LATIN AMERICAN SYMPOSIUM ON CIRCUITS AND SYSTEMS (LASCAS), 2010, : 188 - 191
  • [9] Speculative Parallelization Using Software Multi-threaded Transactions
    Raman, Arun
    Kim, Hanjun
    Mason, Thomas R.
    Jablin, Thomas B.
    August, David I.
    [J]. ASPLOS XV: FIFTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2010, : 65 - 76
  • [10] A Multi-threaded network interface using network processors
    Cascon, Pablo
    Ortega, Julio
    Haider, Waseem M.
    Diaz, Antonio F.
    Rojas, Ignacio
    [J]. PROCEEDINGS OF THE PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2009, : 196 - 200