A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures

被引:0
|
作者
Liu, Xu [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Dept Comp Sci MS 132, Houston, TX 77251 USA
关键词
profiler; threads; NUMA; performance optimization; memory access pattern;
D O I
10.1145/2692916.2555271
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Almost all of today's microprocessors contain memory controllers and directly attach to memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is faster for a microprocessor to access memory that is directly attached than it is to access memory attached to another processor. Without careful distribution of computation and data, a multithreaded program running on such a system may have high average memory access latency. To use multiprocessor systems efficiently, programmers need performance tools to guide the design of NUMA-aware codes. To address this need, we enhanced the HPCToolkit performance tools to support measurement and analysis of performance problems on multiprocessor systems with multiple NUMA domains. With these extensions, HPCToolkit helps pinpoint, quantify, and analyze NUMA bottlenecks in executions of multithreaded programs. It computes derived metrics to assess the severity of bottlenecks, analyzes memory accesses, and provides a wealth of information to guide NUMA optimization, including information about how to distribute data to reduce access latency and minimize contention. This paper describes the design and implementation of our extensions to HPCToolkit. We demonstrate their utility by describing case studies in which we use these capabilities to diagnose NUMA bottlenecks in four multithreaded applications.
引用
收藏
页码:259 / 271
页数:13
相关论文
共 50 条
  • [31] Impact of data distribution on performance of irregular reductions on multithreaded architectures
    Zoppetti, G
    Agrawal, G
    Kumar, R
    [J]. HIGH-PERFORMANCE COMPUTING AND NETWORKING, 2001, 2110 : 483 - 492
  • [32] Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference
    Zhao, Yong
    Rao, Jia
    Yi, Qing
    [J]. 2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 287 - 297
  • [33] Automatic performance prediction of multithreaded programs: a simulation approach
    Alexander Tarvo
    Steven P. Reiss
    [J]. Automated Software Engineering, 2018, 25 : 101 - 155
  • [34] Delegation Locking Libraries for Improved Performance of Multithreaded Programs
    Klaftenegger, David
    Sagonas, Konstantinos
    Winblad, Kjell
    [J]. EURO-PAR 2014 PARALLEL PROCESSING, 2014, 8632 : 572 - 583
  • [35] Modeling and simulation of multithreaded architectures
    Vlassov, V
    Ayani, R
    Thorelli, LE
    [J]. SIMULATION, 1997, 68 (04) : 219 - 230
  • [36] Evaluation of memory performance in NUMA architectures using Stochastic Reward Nets
    Entezari-Maleki, Reza
    Cho, Younghyun
    Egger, Bernhard
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 144 : 172 - 188
  • [37] Modeling and simulation of multithreaded architectures
    Royal Inst of Technology , Stockholm, Sweden
    [J]. Simulation, 4 (219-230):
  • [38] Analytical modeling of multithreaded architectures
    Vlassov, V
    Ayani, R
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2000, 46 (13) : 1205 - 1230
  • [39] LCT: A Parallel Distributed Testing Tool for Multithreaded Java']Java Programs
    Kahkonen, Kari
    Saarikivi, Olli
    Heljanko, Keijo
    [J]. ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2013, 296 : 253 - 259
  • [40] Statistical simulation of multithreaded architectures
    Kihm, JL
    Connors, DA
    [J]. MASCOTS 2005:13TH IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, 2005, : 67 - 74