A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures

被引:0
|
作者
Liu, Xu [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Dept Comp Sci MS 132, Houston, TX 77251 USA
关键词
profiler; threads; NUMA; performance optimization; memory access pattern;
D O I
10.1145/2692916.2555271
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Almost all of today's microprocessors contain memory controllers and directly attach to memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is faster for a microprocessor to access memory that is directly attached than it is to access memory attached to another processor. Without careful distribution of computation and data, a multithreaded program running on such a system may have high average memory access latency. To use multiprocessor systems efficiently, programmers need performance tools to guide the design of NUMA-aware codes. To address this need, we enhanced the HPCToolkit performance tools to support measurement and analysis of performance problems on multiprocessor systems with multiple NUMA domains. With these extensions, HPCToolkit helps pinpoint, quantify, and analyze NUMA bottlenecks in executions of multithreaded programs. It computes derived metrics to assess the severity of bottlenecks, analyzes memory accesses, and provides a wealth of information to guide NUMA optimization, including information about how to distribute data to reduce access latency and minimize contention. This paper describes the design and implementation of our extensions to HPCToolkit. We demonstrate their utility by describing case studies in which we use these capabilities to diagnose NUMA bottlenecks in four multithreaded applications.
引用
收藏
页码:259 / 271
页数:13
相关论文
共 50 条
  • [1] A hybrid tool for the performance evaluation of NUMA architectures
    Westall, J
    Geist, R
    [J]. PROCEEDINGS OF THE 1997 WINTER SIMULATION CONFERENCE, 1997, : 1029 - 1036
  • [2] On the performance of BWA on NUMA architectures
    Lenis, Josefina
    Senar, Miquel Angel
    [J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 236 - 241
  • [3] VPPB - A visualization and performance prediction tool for multithreaded Solaris programs
    Broberg, M
    Lundberg, L
    Grahn, H
    [J]. FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 770 - 776
  • [4] Performance and availability evaluation of NUMA architectures
    Geist, R
    Westall, J
    [J]. IEEE INTERNATIONAL COMPUTER PERFORMANCE AND DEPENDABILITY SYMPOSIUM - IPDS'96, PROCEEDINGS, 1996, : 271 - 280
  • [5] Performance bounds for modeling NUMA architectures
    Geist, R
    [J]. INFORMATION PROCESSING LETTERS, 1997, 63 (02) : 113 - 117
  • [6] Performance oriented programming for NUMA architectures
    Chapman, B
    Patil, A
    Prabhakar, A
    [J]. OPENMP SHARED MEMORY PARALLEL PROGRAMMING, PROCEEDINGS, 2001, 2104 : 137 - 154
  • [7] Exploring the performance of massively multithreaded architectures
    Bokhari, Shahid
    Saltz, Joel
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (05): : 588 - 616
  • [8] Performance of shared caches on multithreaded architectures
    Chen, YY
    Peir, JK
    King, CT
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 1998, 14 (02) : 499 - 514
  • [9] Approximate Computing for Multithreaded Programs in Shared Memory Architectures
    Nongpoh, Bernard
    Ray, Rajarshi
    Banerjee, Ansuman
    [J]. 17TH ACM-IEEE INTERNATIONAL CONFERENCE ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN (MEMOCODE), 2019,
  • [10] (Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads
    Majo, Zoltan
    Gross, Thomas R.
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 11 - 22