(Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads

被引:0
|
作者
Majo, Zoltan [1 ]
Gross, Thomas R. [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An important aspect of workload characterization is understanding memory system performance (i.e., understanding a workload's interaction with the memory system). On systems with a non-uniform memory architecture (NUMA) the performance critically depends on the distribution of data and computations. The actual memory access patterns have a large influence on performance on systems with aggressive prefetcher units. This paper describes an analysis of the memory system performance of multithreaded programs and shows that some programs are (unintentionally) structured so that they use the memory system of today's NUMA-multicores inefficiently: Programs exhibit program-level data sharing, a performance-limiting factor that makes data and computation distribution in NUMA systems difficult. Moreover, many programs have irregular memory access patterns that are hard to predict by processor prefetcher units. The memory system performance as observed for a given program on a specific platform depends also on many algorithm and implementation decisions. The paper shows that a set of simple algorithmic changes coupled with commonly available OS functionality suffice to eliminate data sharing and to regularize the memory access patterns for a subset of the PARSEC parallel benchmarks. These simple source-level changes result in performance improvements of up to 3.1X, but more importantly, they lead to a fairer and more accurate performance evaluation on NUMA-multicore systems. They also illustrate the importance of carefully considering all details of algorithms and architectures to avoid drawing incorrect conclusions.
引用
收藏
页码:11 / 22
页数:12
相关论文
共 50 条
  • [41] Adaptive NUMA-aware data placement and task scheduling for analytical workloads in main-memory column-stores
    Psaroudakis, Iraklis
    Scheuer, Tobias
    May, Norman
    Sellami, Abdelkader
    Ailamaki, Anastasia
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 10 (02): : 37 - 48
  • [42] The performance model of SilkRoad a multithreaded DSM system for clusters
    Liang, P
    Wong, WF
    Yuen, CK
    CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2003, : 495 - 501
  • [43] PERFORMANCE LIMITATIONS OF BLOCK-MULTITHREADED DISTRIBUTED-MEMORY SYSTEMS
    Zuberek, W. M.
    PROCEEDINGS OF THE 2009 WINTER SIMULATION CONFERENCE (WSC 2009 ), VOL 1-4, 2009, : 915 - 923
  • [44] Enabling High-Performance Memory Migration for Multithreaded Applications on Linux
    Goglin, Brice
    Furmento, Nathalie
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 2188 - +
  • [45] HydraFS: an efficient NUMA-aware in-memory file system
    Wu, Ting
    Chen, Xianzhang
    Liu, Kai
    Xiao, Chunhua
    Liu, Zhixiang
    Zhuge, Qingfeng
    Sha, Edwin H. -M.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 705 - 724
  • [46] HydraFS: an efficient NUMA-aware in-memory file system
    Ting Wu
    Xianzhang Chen
    Kai Liu
    Chunhua Xiao
    Zhixiang Liu
    Qingfeng Zhuge
    Edwin H.-M. Sha
    Cluster Computing, 2020, 23 : 705 - 724
  • [47] Memory Optimization Techniques for Multithreaded Operating System on Wireless Sensor Nodes
    Liu, Xing
    Hou, Kun Mean
    de Vaulx, Christophe
    Zhu, Hailun
    Liu, Xin
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2014, : 503 - 508
  • [48] A performance comparison of data and memory allocation strategies for sequence aligners on NUMA architectures
    Josefina Lenis
    Miquel Angel Senar
    Cluster Computing, 2017, 20 : 1909 - 1924
  • [49] Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads
    Clapp, Russell
    Dimitrov, Martin
    Kumar, Karthik
    Viswanathan, Vish
    Willhalm, Thomas
    2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, : 213 - 224
  • [50] A performance comparison of data and memory allocation strategies for sequence aligners on NUMA architectures
    Lenis, Josefina
    Angel Senar, Miquel
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (03): : 1909 - 1924