Locality-Aware Parallel Process Mapping for Multi-Core HPC Systems

被引:14
|
作者
Hursey, Joshua [1 ]
Squyres, Jeffrey M. [1 ]
Dontje, Terry [1 ]
机构
[1] Oak Ridge Natl Lab, Oak Ridge, TN 37831 USA
关键词
Process Affinity; Locality; NUMA; MPI; Resource Management;
D O I
10.1109/CLUSTER.2011.59
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
High Performance Computing (HPC) systems are composed of servers containing an ever-increasing number of cores. With such high processor core counts, non-uniform memory access (NUMA) architectures are almost universally used to reduce inter-processor and memory communication bottlenecks by distributing processors and memory throughout a server-internal networking topology. Application studies have shown that the tuning of processes placement in a server's NUMA networking topology to the application can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale HPC applications. This paper presents the Locality-Aware Mapping Algorithm (LAMA) for distributing the individual processes of a parallel application across processing resources in an HPC system, paying particular attention to the internal server NUMA topologies. The algorithm is able to support both homogeneous and heterogeneous hardware systems, and dynamically adapts to the available hardware and user-specified process layout at run-time. As implemented in Open MPI, the LAMA provides 362,880 mapping permutations and is able to naturally scale out to additional hardware resources as they become available in future architectures.
引用
收藏
页码:527 / 531
页数:5
相关论文
共 50 条
  • [41] Fine-grained locality-aware parallel scheme for anisotropic mesh adaptation
    Rakotoarivelo, Hoby
    Ledoux, Franck
    Pommereau, Franck
    25TH INTERNATIONAL MESHING ROUNDTABLE, 2016, 163 : 123 - 135
  • [42] A Locality-aware Cooperative Distributed Memory Caching for Parallel Data Analytic Applications
    Hung, Chia-Ting
    Chou, Jerry
    Chen, Ming-Hung
    Chung, I-Hsin
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 1111 - 1117
  • [43] Locality-Aware Memory Association for Multi-Target Worksharing in OpenMP
    Scogland, Thomas R. W.
    Feng, Wu-Chun
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 515 - 516
  • [44] An Efficient Data Layout Transformation Algorithm for Locality-Aware Parallel Sparse FFT
    Wang, Cheng
    Chandrasekaran, Sunita
    Chapman, Barbara
    PROCEEDINGS OF IA3 2017: SEVENTH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS, 2017,
  • [45] Thermal-aware Scheduling for Data Parallel Workloads on Multi-Core Processors
    Tan, Hengxing
    Ranka, Sanjay
    2014 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2014,
  • [46] Multi-core aware applications in CMS
    Jones, C. D.
    Elmer, P.
    Sexton-Kennedy, L.
    Green, C.
    Baldooci, A.
    INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2010), 2011, 331
  • [47] Allocating tasks in multi-core processor based parallel systems
    Liu, Yi
    Zhang, Xin
    Li, He
    Qian, Depei
    2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 748 - +
  • [48] Parallel Computation of Adaptive Filtering Algorithms on Multi-Core Systems
    Dong-hwan Lee
    Jaewoo Ahn
    Wonyong Sung
    Journal of Signal Processing Systems, 2012, 69 : 253 - 265
  • [49] A Parallel FastTrack Data Race Detector on Multi-core Systems
    Song, Young Wn
    Lee, Yann-Hang
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 387 - 396
  • [50] Design of a Dynamic Parallel Execution Architecture for Multi-core Systems
    Huang, S., 1600, Springer Science and Business Media Deutschland GmbH (21):