(Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads

被引:0
|
作者
Majo, Zoltan [1 ]
Gross, Thomas R. [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An important aspect of workload characterization is understanding memory system performance (i.e., understanding a workload's interaction with the memory system). On systems with a non-uniform memory architecture (NUMA) the performance critically depends on the distribution of data and computations. The actual memory access patterns have a large influence on performance on systems with aggressive prefetcher units. This paper describes an analysis of the memory system performance of multithreaded programs and shows that some programs are (unintentionally) structured so that they use the memory system of today's NUMA-multicores inefficiently: Programs exhibit program-level data sharing, a performance-limiting factor that makes data and computation distribution in NUMA systems difficult. Moreover, many programs have irregular memory access patterns that are hard to predict by processor prefetcher units. The memory system performance as observed for a given program on a specific platform depends also on many algorithm and implementation decisions. The paper shows that a set of simple algorithmic changes coupled with commonly available OS functionality suffice to eliminate data sharing and to regularize the memory access patterns for a subset of the PARSEC parallel benchmarks. These simple source-level changes result in performance improvements of up to 3.1X, but more importantly, they lead to a fairer and more accurate performance evaluation on NUMA-multicore systems. They also illustrate the importance of carefully considering all details of algorithms and architectures to avoid drawing incorrect conclusions.
引用
收藏
页码:11 / 22
页数:12
相关论文
共 50 条
  • [1] A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures
    Liu, Xu
    Mellor-Crummey, John
    ACM SIGPLAN NOTICES, 2014, 49 (08) : 259 - 271
  • [2] Performance Characterization of Spark Workloads on Shared NUMA Systems
    Baig, Shuja-ur-Rehman
    Amaral, Marcelo
    Polo, Jorda
    Carrera, David
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2018), 2018, : 41 - 48
  • [3] Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads
    Choo, Kyoshin
    Panlener, William
    Jang, Byunghyun
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 189 - 196
  • [4] Understanding the performance of storage class memory file systems in the NUMA architecture
    Jangwoong Kim
    Youngjae Kim
    Awais Khan
    Sungyong Park
    Cluster Computing, 2019, 22 : 347 - 360
  • [5] Understanding the performance of storage class memory file systems in the NUMA architecture
    Kim, Jangwoong
    Kim, Youngjae
    Khan, Awais
    Park, Sungyong
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : 347 - 360
  • [6] PsmArena: Partitioned Shared Memory for NUMA-Awareness in Multithreaded Scientific Applications
    Zhang Yang
    Aiqing Zhang
    Zeyao Mo
    TsinghuaScienceandTechnology, 2021, 26 (03) : 287 - 295
  • [7] PsmArena: Partitioned Shared Memory for NUMA-Awareness in Multithreaded Scientific Applications
    Yang, Zhang
    Zhang, Aiqing
    Mo, Zeyao
    TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (03) : 287 - 295
  • [8] RPPM: Rapid Performance Prediction of Multithreaded Workloads on Multicore Processors
    De Pestel, Sander
    Van den Steen, Sam
    Akram, Shoaib
    Eeckhout, Lieven
    2019 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2019, : 257 - 267
  • [9] Understanding the Behavior of In-Memory Computing Workloads
    Jiang, Tao
    Zhang, Qianlong
    Hou, Rui
    Chai, Lin
    Mckee, Sally A.
    Jia, Zhen
    Sun, Ninghui
    2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2014, : 22 - 30
  • [10] Optimizing Virtual Machine Consolidation Performance on NUMA Server Architecture for Cloud Workloads
    Liu, Ming
    Li, Tao
    2014 ACM/IEEE 41ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2014, : 325 - 336