A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures

被引:0
|
作者
Liu, Xu [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Dept Comp Sci MS 132, Houston, TX 77251 USA
关键词
profiler; threads; NUMA; performance optimization; memory access pattern;
D O I
10.1145/2692916.2555271
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Almost all of today's microprocessors contain memory controllers and directly attach to memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is faster for a microprocessor to access memory that is directly attached than it is to access memory attached to another processor. Without careful distribution of computation and data, a multithreaded program running on such a system may have high average memory access latency. To use multiprocessor systems efficiently, programmers need performance tools to guide the design of NUMA-aware codes. To address this need, we enhanced the HPCToolkit performance tools to support measurement and analysis of performance problems on multiprocessor systems with multiple NUMA domains. With these extensions, HPCToolkit helps pinpoint, quantify, and analyze NUMA bottlenecks in executions of multithreaded programs. It computes derived metrics to assess the severity of bottlenecks, analyzes memory accesses, and provides a wealth of information to guide NUMA optimization, including information about how to distribute data to reduce access latency and minimize contention. This paper describes the design and implementation of our extensions to HPCToolkit. We demonstrate their utility by describing case studies in which we use these capabilities to diagnose NUMA bottlenecks in four multithreaded applications.
引用
收藏
页码:259 / 271
页数:13
相关论文
共 50 条
  • [21] Redeeming IPC as a performance metric for multithreaded programs
    Lepak, KM
    Cain, HW
    Lipasti, MH
    [J]. 12TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2003, : 232 - 243
  • [22] MULTITHREADED PROCESSOR ARCHITECTURES
    BYRD, CT
    HOLLIDAY, MA
    [J]. IEEE SPECTRUM, 1995, 32 (08) : 38 - 46
  • [23] Multithreaded vector architectures
    Espasa, R
    Valero, M
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE - PROCEEDINGS, 1997, : 237 - 248
  • [24] Maple: A Coverage-Driven Testing Tool for Multithreaded Programs
    Yu, Jie
    Narayanasamy, Satish
    Pereira, Cristiano
    Pokam, Gilles
    [J]. ACM SIGPLAN NOTICES, 2012, 47 (10) : 485 - 502
  • [25] Performance analysis of four parallel programming models on NUMA architectures
    Mohamed, AS
    Cantonnet, F
    [J]. PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2003, : 119 - 125
  • [26] Evaluating Performance and Energy Consumption of Multithreaded Applications in cc-NUMA Multicore Processors
    Cai, Min
    Fang, Juan
    Song, Shu-ying
    Ji, Jun-zhong
    Li, Bin
    [J]. COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 46 - 54
  • [27] Optimizing operating system performance for CC-NUMA architectures
    Chang, MS
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2003, 15 (14): : 1257 - 1274
  • [28] SPADE: Verification of multithreaded dynamic and recursive programs - (Tool paper)
    Patin, Gael
    Sighireanu, Mihaela
    Touili, Tayssir
    [J]. COMPUTER AIDED VERIFICATION, PROCEEDINGS, 2007, 4590 : 254 - +
  • [29] ThreadMon: A tool for monitoring multithreaded program performance
    Cantrill, BM
    Doeppner, TW
    [J]. THIRTIETH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL 1: SOFTWARE TECHNOLOGY AND ARCHITECTURE, 1997, : 253 - 265
  • [30] Automatic performance prediction of multithreaded programs: a simulation approach
    Tarvo, Alexander
    Reiss, Steven P.
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2018, 25 (01) : 101 - 155