NumaMMA: NUMA MeMory Analyzer

被引:14
|
作者
Trahay, Francois [1 ]
Selva, Manuel [2 ]
Morel, Lionel [3 ]
Marquet, Kevin [4 ]
机构
[1] Univ Paris Saclay, SAMOVAR, Telecom SudParis, CNRS, Evry, France
[2] Univ Grenoble Alpes, LIG, Grenoble INP, CNRS,Inria, Grenoble, France
[3] Univ Grenoble Alpes, List, CEA, Grenoble, France
[4] Univ Lyon, CITI, INRIA, INSA Lyon, Villeurbanne, France
关键词
Performance analysis; NUMA architectures; Data and threads placement; Memory sampling; PLACEMENT; THREAD;
D O I
10.1145/3225058.3225094
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Non Uniform Memory Access (NUMA) architectures are nowadays common for running High-Performance Computing (HPC) applications. In such architectures, several distinct physical memories are assembled to create a single shared memory. Nevertheless, because there are several physical memories, access times to these memories are not uniform depending on the location of the core performing the memory request and on the location of the target memory. Hence, threads and data placement are crucial to efficiently exploit such architectures. To help in taking decision about this placement, profiling tools are needed. In this work, we propose NUMA MeMory Analyzer (NumaMMA), a new profiling tool for understanding the memory access patterns of HPC applications. NumaMMA combines efficient collection of memory traces using hardware mechanisms with original visualization means allowing to see how memory access patterns evolve over time. The information reported by NumaMMA allows to understand the nature of these access patterns inside each object allocated by the application. We show how NumaMMA can help understanding the memory patterns of several HPC applications in order to optimize them and get speedups up to 28% over the standard non optimized version.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] EVALUATION OF NUMA MEMORY MANAGEMENT THROUGH MODELING AND MEASUREMENTS
    LAROWE, RP
    ELLIS, CS
    HOLLIDAY, MA
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1992, 3 (06) : 686 - 701
  • [22] NUMA (Non-Uniform Memory Access): An overview: NUMA becomes more common because memory controllers get close to execution units on microprocessors
    Lameter, C., 1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (11):
  • [23] NOBtree: A NUMA-Optimized Tree Index for Nonvolatile Memory
    Chu, Zhaole
    Jin, Peiquan
    Luo, Yongping
    Wang, Xiaoliang
    Wan, Shouhong
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (11) : 3840 - 3851
  • [24] INTERMEDIATE MEMORY FOR A MULTIDIMENSIONAL ANALYZER
    PUZANOV, VV
    SHTRANIK.IV
    MATACHUN, AT
    INSTRUMENTS AND EXPERIMENTAL TECHNIQUES-USSR, 1966, (03): : 593 - &
  • [25] Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems
    Dashti, Mohammad
    Fedorova, Alexandra
    Funston, Justin
    Gaud, Fabien
    Lachaize, Renaud
    Lepers, Baptiste
    Quema, Vivien
    Roth, Mark
    ACM SIGPLAN NOTICES, 2013, 48 (04) : 381 - 393
  • [26] Memory conscious scheduling for cluster-based NUMA multiprocessors
    Koita, T
    Katayama, T
    Saisho, K
    Fukuda, A
    JOURNAL OF SUPERCOMPUTING, 2000, 16 (03): : 217 - 235
  • [27] An Adaptive and Hierarchical CPU Allocation for Multicore NUMA Memory Organization
    Kang, Dongwoo
    Park, Heekwon
    Choi, Jongmoo
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (05): : 1595 - 1611
  • [28] Performance Optimization for In-Memory File Systems on NUMA Machines
    Liu, Zhixiang
    Sha, Edwin H. -M.
    Chen, Xianzhang
    Jiang, Weiwen
    Zhuge, Qingfeng
    2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 7 - 12
  • [29] THE IMPLEMENTATION OF A COHERENT MEMORY ABSTRACTION ON A NUMA MULTIPROCESSOR - EXPERIENCES WITH PLATINUM
    COX, AL
    FOWLER, RJ
    OPERATING SYSTEMS REVIEW, VOL 23, NO 5, SPECIAL ISSUE: PROCEEDINGS OF THE TWELFTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, 1989, : 32 - 44
  • [30] (Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads
    Majo, Zoltan
    Gross, Thomas R.
    2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 11 - 22