ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs

被引:25
|
作者
Liu, Xu [1 ]
Wu, Bo [2 ]
机构
[1] Coll William & Mary, Dept Comp Sci, Williamsburg, VA 23185 USA
[2] Colorado Sch Mines, Dept EECS, Golden, CO 80401 USA
关键词
Memory bottlenecks; scalability; parallel profiler;
D O I
10.1145/2807591.2807648
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
It is difficult to scale parallel programs in a system that employs a large number of cores. To identify scalability bottlenecks, existing tools principally pinpoint poor thread synchronization strategies or unnecessary data communication. Memory subsystem is one of the key contributors to poor parallel scaling in multicore machines. State-of-theart tools, however, either lack sophisticated capabilities or are completely ignorant in pinpointing scalability bottlenecks arising from the memory subsystem. To address this issue, we develop a tool ScaAnalyzer to pinpoint scaling losses due to poor memory access behaviors of parallel programs. ScaAnalyzer collects, attributes, and analyzes memory-related metrics during program execution while incurring very low overhead. ScaAnalyzer provides high-level, detailed guidance to programmers for scalability optimization. We demonstrate the utility of ScaAnalyzer with case studies of three parallel programs. For each benchmark, ScaAnalyzer identifies scalability bottlenecks caused by poor memory access behaviors and provides optimization guidance that yields significant improvement in scalability.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Identifying Scalability Bottlenecks for Large-Scale Parallel Programs with Graph Analysis
    Jin, Yuyang
    Wang, Haojie
    Tang, Xiongchao
    Hoefler, Torsten
    Liu, Xu
    Zhai, Jidong
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 409 - 410
  • [2] Executing parallel programs with synchronization bottlenecks efficiently
    Oyama, Y
    Taura, K
    Yonezawa, A
    PARALLEL AND DISTRIBUTED COMPUTING FOR SYMBOLIC AND IRREGULAR APPLICATIONS, 2000, : 182 - 204
  • [3] Diagnosing Performance Bottlenecks in Massive Data Parallel Programs
    Dias, Vinicius
    Moreira, Rubens
    Meira, Wagner, Jr.
    Guedes, Dorgival
    2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 273 - 276
  • [4] Detection and optimization approaches for synchronization bottlenecks in parallel programs
    Zhang Y.
    Li L.
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2022, 44 (05): : 92 - 101
  • [5] MemSpy. Analyzing memory system bottlenecks in programs
    Martonosi, Margaret
    Gupta, Anoop
    Performance Evaluation Review, 1992, 20 (01):
  • [6] Profiling of task-based applications on shared memory machines: Scalability and bottlenecks
    Hoffmann, Ralf
    Rauber, Thomas
    EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 118 - +
  • [7] Reducing memory bottlenecks in embedded, parallel image processors
    McBader, S
    Lee, P
    ELECTRONICS LETTERS, 2003, 39 (01) : 33 - 35
  • [8] PaScal Viewer: A Tool for the Visualization of Parallel Scalability Trends
    da Silva, Anderson B. N.
    Cunha, Daniel A. M.
    Silva, Vitor R. G.
    Furtunato, Alex F. de A.
    Xavier-de-Souza, Samuel
    PROGRAMMING AND PERFORMANCE VISUALIZATION TOOLS, 2019, 11027 : 250 - 264
  • [9] Memory requirements for parallel programs
    Burton, FW
    Simpson, DJ
    PARALLEL COMPUTING, 2000, 26 (13-14) : 1739 - 1763
  • [10] Architecture scalability of parallel vector computers with a shared memory
    Dekker, E
    IEEE TRANSACTIONS ON COMPUTERS, 1998, 47 (05) : 614 - 624