(Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads

被引：0

作者：

Majo, Zoltan ^{[1
]}

Gross, Thomas R. ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

来源：

2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013) | 2013年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

An important aspect of workload characterization is understanding memory system performance (i.e., understanding a workload's interaction with the memory system). On systems with a non-uniform memory architecture (NUMA) the performance critically depends on the distribution of data and computations. The actual memory access patterns have a large influence on performance on systems with aggressive prefetcher units. This paper describes an analysis of the memory system performance of multithreaded programs and shows that some programs are (unintentionally) structured so that they use the memory system of today's NUMA-multicores inefficiently: Programs exhibit program-level data sharing, a performance-limiting factor that makes data and computation distribution in NUMA systems difficult. Moreover, many programs have irregular memory access patterns that are hard to predict by processor prefetcher units. The memory system performance as observed for a given program on a specific platform depends also on many algorithm and implementation decisions. The paper shows that a set of simple algorithmic changes coupled with commonly available OS functionality suffice to eliminate data sharing and to regularize the memory access patterns for a subset of the PARSEC parallel benchmarks. These simple source-level changes result in performance improvements of up to 3.1X, but more importantly, they lead to a fairer and more accurate performance evaluation on NUMA-multicore systems. They also illustrate the importance of carefully considering all details of algorithms and architectures to avoid drawing incorrect conclusions.

引用

页码：11 / 22

页数：12

共 50 条

[1] A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures
Liu, Xu
Mellor-Crummey, John
ACM SIGPLAN NOTICES, 2014, 49 (08) : 259 - 271
[2] Performance Characterization of Spark Workloads on Shared NUMA Systems
Baig, Shuja-ur-Rehman
Amaral, Marcelo
Polo, Jorda
Carrera, David
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2018), 2018, : 41 - 48
[3] Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads
Choo, Kyoshin
Panlener, William
Jang, Byunghyun
2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 189 - 196
[4] Understanding the performance of storage class memory file systems in the NUMA architecture
Jangwoong Kim
Youngjae Kim
Awais Khan
Sungyong Park
Cluster Computing, 2019, 22 : 347 - 360
[5] Understanding the performance of storage class memory file systems in the NUMA architecture
Kim, Jangwoong
Kim, Youngjae
Khan, Awais
Park, Sungyong
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : 347 - 360
[6] PsmArena: Partitioned Shared Memory for NUMA-Awareness in Multithreaded Scientific Applications
Zhang Yang
Aiqing Zhang
Zeyao Mo
TsinghuaScienceandTechnology, 2021, 26 (03) : 287 - 295
[7] PsmArena: Partitioned Shared Memory for NUMA-Awareness in Multithreaded Scientific Applications
Yang, Zhang
Zhang, Aiqing
Mo, Zeyao
TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (03) : 287 - 295
[8] RPPM: Rapid Performance Prediction of Multithreaded Workloads on Multicore Processors
De Pestel, Sander
Van den Steen, Sam
Akram, Shoaib
Eeckhout, Lieven
2019 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2019, : 257 - 267
[9] Understanding the Behavior of In-Memory Computing Workloads
Jiang, Tao
Zhang, Qianlong
Hou, Rui
Chai, Lin
Mckee, Sally A.
Jia, Zhen
Sun, Ninghui
2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2014, : 22 - 30
[10] Optimizing Virtual Machine Consolidation Performance on NUMA Server Architecture for Cloud Workloads
Liu, Ming
Li, Tao
2014 ACM/IEEE 41ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2014, : 325 - 336

← 1 2 3 4 5 →