DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems

被引:3
|
作者
Agung, Mulya [1 ]
Amrizal, Muhammad Alfian [2 ]
Egawa, Ryusuke [3 ]
Takizawa, Hiroyuki [3 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 9808578, Japan
[2] Tohoku Univ, Elect Commun Res Inst, Sendai, Miyagi 9808577, Japan
[3] Tohoku Univ, Cybersci Ctr, Sendai, Miyagi 9808578, Japan
关键词
High-performance computing; locality; memory congestion; NUMA; process mapping; task mapping; thread mapping; COMMUNICATION; MPI; MANAGEMENT; PLACEMENT; THREAD; TOOLS; MULTI;
D O I
10.1109/ACCESS.2019.2963726
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controllers could cause long latencies. Conventional work on task mapping mostly focuses on improving the locality of memory accesses. However, our previous work showed that on modern NUMA systems, maximizing the locality can degrade the performance due to memory congestion. In this work, we propose a task mapping method that addresses the locality and the memory congestion problems to improve the performance of parallel applications. In the proposed method, first, the spatial and temporal communication behaviors of the applications are analyzed from the time-series dataset of communications among the parallel tasks. Then, a data clustering technique is employed to detect groups of tasks that potentially cause the memory congestion. Finally, this information is used to compute the task mapping to improve the locality and reduce the memory congestion. We also provide a set of metrics to describe the communication behaviors and to evaluate if the target application can benefit from our method. The proposed method is evaluated with the NPB and PARSEC applications on a real NUMA system and a multicore simulator. A detailed analysis of the sources of performance gain is also provided. Experimental results show that our method can achieve up to a 61 & x0025; performance improvement compared with the state-of-the-art locality-based method.
引用
收藏
页码:6937 / 6953
页数:17
相关论文
共 50 条
  • [41] Data-aware task scheduling on heterogeneous hybrid memory multiprocessor systems
    Chen, Junjie
    Li, Kenli
    Tang, Zhuo
    Liu, Chubo
    Wang, Yan
    Li, Keqin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (17): : 4443 - 4459
  • [42] LAMS: A Latency-Aware Memory Scheduling Policy for Modern DRAM Systems
    Liu, Wenjie
    Huang, Ping
    Kun, Tang
    Lu, Tao
    Zhou, Ke
    Li, Chunhua
    He, Xubin
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [43] Performance Constraint-Aware Task Mapping to Optimize Lifetime Reliability of Manycore Systems
    Rathore, Vijeta
    Chaturvedi, Vivek
    Srikanthan, Thambipillai
    2016 INTERNATIONAL GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI), 2016, : 377 - 380
  • [44] Congestion-aware core mapping for Network-on-Chip based systems using betweenness centrality
    Maqsood, Tahir
    Bilal, Kashif
    Madani, Sajjad A.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 82 : 459 - 471
  • [45] Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement
    Psaroudakis, Iraklis
    Scheuer, Tobias
    May, Norman
    Sellami, Abdelkader
    Ailamaki, Anastasia
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12): : 1442 - 1453
  • [46] Task mapping on Distributed Shared Memory systems using Hopfield neural network
    Liang, TY
    Shieh, CK
    Zhu, WP
    CONFERENCE ON COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS MODELING AND SIMULATION (CNDS'97), 1997, : 37 - 43
  • [47] DDAM: Data Distribution-Aware Mapping of CNNs on Processing-In-Memory Systems
    Wang, Junpeng
    Du, Haitao
    Ding, Bo
    Xu, Qi
    Chen, Song
    Kang, Yi
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2023, 28 (03)
  • [48] A Reliability-Aware Address Mapping Strategy for NAND Flash Memory Storage Systems
    Wang, Yi
    Huang, Min
    Shao, Zili
    Chan, Henry C. B.
    Bathen, Luis Angel D.
    Dutt, Nikil D.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2014, 33 (11) : 1623 - 1631
  • [49] Energy-Aware Real-Time Task Scheduling on Local/Shared Memory Systems
    Fu, Chenchen
    Calinescu, Gruia
    Wang, Kai
    Li, Minming
    Xue, Chun Jason
    PROCEEDINGS OF 2016 IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS), 2016, : 269 - 278
  • [50] Memory-Aware Genetic Algorithms for Task Mapping on Hard Real-Time Networks-on-Chip
    Still, Lloyd Robert
    Indrusiak, Leandro Soares
    2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 601 - 608