Locality-Aware Parallel Process Mapping for Multi-Core HPC Systems

被引:14
|
作者
Hursey, Joshua [1 ]
Squyres, Jeffrey M. [1 ]
Dontje, Terry [1 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
关键词
Process Affinity; Locality; NUMA; MPI; Resource Management;
D O I
10.1109/CLUSTER.2011.59
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
High Performance Computing (HPC) systems are composed of servers containing an ever-increasing number of cores. With such high processor core counts, non-uniform memory access (NUMA) architectures are almost universally used to reduce inter-processor and memory communication bottlenecks by distributing processors and memory throughout a server-internal networking topology. Application studies have shown that the tuning of processes placement in a server's NUMA networking topology to the application can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale HPC applications. This paper presents the Locality-Aware Mapping Algorithm (LAMA) for distributing the individual processes of a parallel application across processing resources in an HPC system, paying particular attention to the internal server NUMA topologies. The algorithm is able to support both homogeneous and heterogeneous hardware systems, and dynamically adapts to the available hardware and user-specified process layout at run-time. As implemented in Open MPI, the LAMA provides 362,880 mapping permutations and is able to naturally scale out to additional hardware resources as they become available in future architectures.
引用
收藏
页码:527 / 531
页数:5
相关论文
共 50 条
  • [21] Parallel Skyline Queries on Multi-Core Systems
    Liou, Meng-Zong
    Shu, Yi-Teng
    Chen, Wei-Mei
    2013 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2013, : 287 - 292
  • [22] A Communication-Aware Solution Framework for Mapping AUTOSAR Runnables on Multi-core Systems
    Faragardi, Hamid Reza
    Lisper, Bjorn
    Sandstrom, Kristian
    Nolte, Thomas
    2014 IEEE EMERGING TECHNOLOGY AND FACTORY AUTOMATION (ETFA), 2014,
  • [23] Locality-Aware Scheduling of Independent Tasks for Runtime Systems
    Gonthier, Maxime
    Marchal, Loris
    Thibault, Samuel
    EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS, 2022, 13098 : 5 - 16
  • [24] Locality-aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems
    Belayneh, Leul
    Ye, Haojie
    Chen, Kuan-Yu
    Blaauw, David
    Mudge, Trevor
    Dreslinski, Ronald
    Talati, Nishil
    PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 304 - 316
  • [25] NBTI Aware Workload Balancing in Multi-core Systems
    Sun, Jin
    Kodi, Avinash
    Louri, Ahmed
    Wang, Janet M.
    ISQED 2009: PROCEEDINGS 10TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, VOLS 1 AND 2, 2009, : 833 - +
  • [26] Towards locality-aware DHT for fast mapping service in future Internet
    Wang, Peng
    Lan, Julong
    Hu, Yuxiang
    Chen, Shuqiao
    COMPUTER COMMUNICATIONS, 2015, 66 : 14 - 24
  • [27] Nested parallelism for multi-core HPC systems using Java']Java
    Shafi, Aamir
    Carpenter, Bryan
    Baker, Mark
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2009, 69 (06) : 532 - 545
  • [28] Process Variation Aware Performance Modeling and Dynamic Power Management for Multi-Core Systems
    Garg, Siddharth
    Marculescu, Diana
    Herbert, Sebastian X.
    2010 IEEE AND ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2010, : 89 - 92
  • [29] PARALLEL FPGA TECHNOLOGY MAPPING USING MULTI-CORE ARCHITECTURES
    Kennings, Andrew
    Ravishankar, Chirag
    2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 274 - 279
  • [30] Work Stealing for Multi-core HPC Clusters
    Ravichandran, Kaushik
    Lee, Sangho
    Pande, Santosh
    EURO-PAR 2011 PARALLEL PROCESSING, PT 1, 2011, 6852 : 205 - 217