DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems

被引:3
|
作者
Agung, Mulya [1 ]
Amrizal, Muhammad Alfian [2 ]
Egawa, Ryusuke [3 ]
Takizawa, Hiroyuki [3 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 9808578, Japan
[2] Tohoku Univ, Elect Commun Res Inst, Sendai, Miyagi 9808577, Japan
[3] Tohoku Univ, Cybersci Ctr, Sendai, Miyagi 9808578, Japan
关键词
High-performance computing; locality; memory congestion; NUMA; process mapping; task mapping; thread mapping; COMMUNICATION; MPI; MANAGEMENT; PLACEMENT; THREAD; TOOLS; MULTI;
D O I
10.1109/ACCESS.2019.2963726
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controllers could cause long latencies. Conventional work on task mapping mostly focuses on improving the locality of memory accesses. However, our previous work showed that on modern NUMA systems, maximizing the locality can degrade the performance due to memory congestion. In this work, we propose a task mapping method that addresses the locality and the memory congestion problems to improve the performance of parallel applications. In the proposed method, first, the spatial and temporal communication behaviors of the applications are analyzed from the time-series dataset of communications among the parallel tasks. Then, a data clustering technique is employed to detect groups of tasks that potentially cause the memory congestion. Finally, this information is used to compute the task mapping to improve the locality and reduce the memory congestion. We also provide a set of metrics to describe the communication behaviors and to evaluate if the target application can benefit from our method. The proposed method is evaluated with the NPB and PARSEC applications on a real NUMA system and a multicore simulator. A detailed analysis of the sources of performance gain is also provided. Experimental results show that our method can achieve up to a 61 & x0025; performance improvement compared with the state-of-the-art locality-based method.
引用
收藏
页码:6937 / 6953
页数:17
相关论文
共 50 条
  • [11] Tiresias: Optimizing NUMA Performance with CXL Memory and Locality-Aware Process Scheduling
    Tang, Wenda
    Ai, Tianxiang
    Wu, Jie
    PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 6 - 11
  • [12] Congestion-Aware Memory Management on NUMA Platforms: A VMware ESXi case study
    Kotra, Jagadish B.
    Kim, Seongbeom
    Madduri, Kamesh
    Kandemir, Mahmut T.
    PROCEEDINGS OF THE 2017 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2017, : 146 - 155
  • [13] Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems
    Diener, Matthias
    Cruz, Eduardo H. M.
    Navaux, Philippe O. A.
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 9 - 16
  • [14] Task and Memory Mapping of Large Size Embedded Applications over NUMA architecture
    Druetto, Alessandro
    Bini, Enrico
    Grosso, Andrea
    Puri, Stefano
    Bacci, Silvio
    Di Natale, Marco
    Paladino, Francesco
    PROCEEDINGS OF 31ST INTERNATIONAL CONFERENCE ON REAL-TIME NETWORKS AND SYSTEMS, RTNS 2023, 2023, : 166 - 176
  • [15] HiNUMA: NUMA-aware Data Placement and Migration in Hybrid Memory Systems
    Duan, Zhuohui
    Liu, Haikun
    Liao, Xiaofei
    Jin, Hai
    Jiang, Wenbin
    Zhang, Yu
    2019 IEEE 37TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2019), 2019, : 367 - 375
  • [16] NUMA-aware memory coloring for multicore real-time systems
    Pan, Xing
    Mueller, Frank
    JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 118
  • [17] Locality-aware task scheduling for homogeneous parallel computing systems
    Muhammad Khurram Bhatti
    Isil Oz
    Sarah Amin
    Maria Mushtaq
    Umer Farooq
    Konstantin Popov
    Mats Brorsson
    Computing, 2018, 100 : 557 - 595
  • [18] Locality-aware task scheduling for homogeneous parallel computing systems
    Bhatti, Muhammad Khurram
    Oz, Isil
    Amin, Sarah
    Mushtaq, Maria
    Farooq, Umer
    Popov, Konstantin
    Brorsson, Mats
    COMPUTING, 2018, 100 (06) : 557 - 595
  • [19] Communication and Congestion Aware Run-Time Task Mapping on Heterogeneous MPSoCs
    Khajekarimi, Elyas
    Hashemi, Mahmoud Reza
    2012 16TH CSI INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SYSTEMS (CADS), 2012, : 127 - 132
  • [20] Quantitatively Measuring the Memory Locality Leakage on NUMA Systems based on Instruction-Based-Sampling
    Luo, Qiuming
    Liu, Chenjian
    Kong, Chang
    Cai, Ye
    2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 251 - 256