DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems

被引:3
|
作者
Agung, Mulya [1 ]
Amrizal, Muhammad Alfian [2 ]
Egawa, Ryusuke [3 ]
Takizawa, Hiroyuki [3 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 9808578, Japan
[2] Tohoku Univ, Elect Commun Res Inst, Sendai, Miyagi 9808577, Japan
[3] Tohoku Univ, Cybersci Ctr, Sendai, Miyagi 9808578, Japan
关键词
High-performance computing; locality; memory congestion; NUMA; process mapping; task mapping; thread mapping; COMMUNICATION; MPI; MANAGEMENT; PLACEMENT; THREAD; TOOLS; MULTI;
D O I
10.1109/ACCESS.2019.2963726
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controllers could cause long latencies. Conventional work on task mapping mostly focuses on improving the locality of memory accesses. However, our previous work showed that on modern NUMA systems, maximizing the locality can degrade the performance due to memory congestion. In this work, we propose a task mapping method that addresses the locality and the memory congestion problems to improve the performance of parallel applications. In the proposed method, first, the spatial and temporal communication behaviors of the applications are analyzed from the time-series dataset of communications among the parallel tasks. Then, a data clustering technique is employed to detect groups of tasks that potentially cause the memory congestion. Finally, this information is used to compute the task mapping to improve the locality and reduce the memory congestion. We also provide a set of metrics to describe the communication behaviors and to evaluate if the target application can benefit from our method. The proposed method is evaluated with the NPB and PARSEC applications on a real NUMA system and a multicore simulator. A detailed analysis of the sources of performance gain is also provided. Experimental results show that our method can achieve up to a 61 & x0025; performance improvement compared with the state-of-the-art locality-based method.
引用
收藏
页码:6937 / 6953
页数:17
相关论文
共 50 条
  • [1] The Impacts of Locality and Memory Congestion-aware Thread Mapping on Energy Consumption of Modern NUMA Systems
    Agung, Mulya
    Amrizal, Muhammad Alfian
    Egawa, Ryusuke
    Takizawa, Hiroyuki
    2019 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS 22), 2019,
  • [2] An Automatic MPI Process Mapping Method Considering Locality and Memory Congestion on NUMA Systems
    Agung, Mulya
    Amrizal, Muhammad Alfian
    Egawa, Ryusuke
    Takizawa, Hiroyuki
    2019 IEEE 13TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2019), 2019, : 17 - 24
  • [3] Online MPI process mapping for coordinating locality and memory congestion on NUMA systems
    Agung M.
    Amrizal M.A.
    Egawa R.
    Takizawa H.
    1600, South Ural State University, Publishing Center (07): : 71 - 90
  • [4] A Memory Congestion-aware MPI Process Placement for Modern NUMA Systems
    Agung, Mulya
    Amrizal, Muhammad Alfian
    Komatsu, Kazuhiko
    Egawa, Ryusuke
    Takizawa, Hiroyuki
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 152 - 161
  • [5] Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors
    Muddukrishna, Ananya
    Jonsson, Peter A.
    Brorsson, Mats
    SCIENTIFIC PROGRAMMING, 2015, 2015
  • [6] NUMA-BTDM: A thread mapping algorithm for balanced data locality on NUMA systems
    Stirb, Iulia
    2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 317 - 320
  • [7] Challenges of memory management: On modern numa systems
    Gaud, Fabien
    Lepers, Baptiste
    Funston, Justin
    Dashti, Mohammad
    Fedorova, Alexandra
    Quéma, Vivien
    Lachaize, Renaud
    Roth, Mark
    2015, Association for Computing Machinery (13): : 99 - 124
  • [8] Challenges of Memory Management on Modern NUMA Systems
    Gaud, Fabien
    Lepers, Baptiste
    Funston, Justin
    Dashti, Mohammad
    Fedorova, Alexandra
    Quema, Vivien
    Lachaize, Renaud
    Roth, Mark
    COMMUNICATIONS OF THE ACM, 2015, 58 (12) : 59 - 66
  • [9] A Data Locality and Memory Contention Analysis Method in Embedded NUMA Multi-core Systems
    Li, Lin
    Fussenegger, Markus
    Cichon, Gordon
    2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC), 2016, : 85 - 92
  • [10] Congestion-aware Task Mapping in Heterogeneous MPSoCs
    Carvalho, Ewerson
    Moraes, Fernando
    2008 INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP, PROCEEDINGS, 2008, : 65 - 68