Impact of memory contention on dynamic scheduling on NUMA multiprocessors

被引:9
|
作者
Durand, D
Montaut, T
Kervella, L
Jalby, W
机构
[1] UNIV VERSAILLES,LAB MASI,F-78000 VERSAILLES,FRANCE
[2] INST RECH INFORMAT & SYST ALEATOIRES,F-35042 RENNES,FRANCE
基金
美国国家科学基金会;
关键词
dynamic scheduling; load balancing; memory performance; NUMA multiprocessors; self-scheduling;
D O I
10.1109/71.544359
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Self-scheduling is a method for task scheduling in parallel programs, in which each processor acquires a new block of tasks for execution whenever it becomes idle. To get the best performance, the block size must be chosen to balance the scheduling overhead against the load imbalance. To determine the best block size, a better understanding of the role of load imbalance in self-scheduling performance is needed. In this paper we study the effect of memory contention on task duration distributions and, hence, load balancing in self-scheduling on a Nonuniform Memory Access (NUMA) machine. Experimental studies on a BBN TC2000 are used to reveal the strengths and weaknesses of analytical performance models to predict running time and optimal block size. The models are shown to be very accurate for small block sizes. However, the models fail when the block size is large due to a previously unrecognized source of load imbalance. We extend the analytical models to address this failure. The implications for the construction of compilers and runtime systems are discussed.
引用
收藏
页码:1201 / 1214
页数:14
相关论文
共 50 条
  • [41] A scheduling algorithm for bus-based shared memory multiprocessors
    Kang, OH
    Kim, SG
    CCCT 2003 VOL, 2, PROCEEDINGS: COMMUNICATIONS SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2003, : 1 - 3
  • [42] A task duplication based scheduling algorithm for shared memory multiprocessors
    Kang, OH
    Kim, SG
    PARALLEL COMPUTING, 2003, 29 (01) : 161 - 166
  • [43] A duplication heuristic for static scheduling of tasks on distributed memory multiprocessors
    Chung, YC
    Liu, CC
    Liu, JS
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 1995, 18 (06) : 845 - 855
  • [44] FAST, CONTENTION-FREE COMBINING TREE BARRIERS FOR SHARED-MEMORY MULTIPROCESSORS
    SCOTT, ML
    MELLORCRUMMEY, JM
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 1994, 22 (04) : 449 - 481
  • [45] The Impact of Memory Models on Software Reliability in Multiprocessors
    Jaffe, Alexander
    Moscibroda, Thomas
    Effinger-Dean, Laura
    Ceze, Luis
    Strauss, Karin
    PODC 11: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM PRINCIPLES OF DISTRIBUTED COMPUTING, 2011, : 89 - 98
  • [46] A Data Locality and Memory Contention Analysis Method in Embedded NUMA Multi-core Systems
    Li, Lin
    Fussenegger, Markus
    Cichon, Gordon
    2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC), 2016, : 85 - 92
  • [47] Exploiting Network Locality for CC-NUMA Multiprocessors
    Hung-Chang Hsiao
    Chung-Ta King
    The Journal of Supercomputing, 2001, 18 : 63 - 87
  • [48] Exploiting network locality for CC-NUMA multiprocessors
    Hsiao, HC
    King, CT
    JOURNAL OF SUPERCOMPUTING, 2001, 18 (01): : 63 - 87
  • [49] Switch MSHR: A technique to reduce remote read memory access time in CC-NUMA multiprocessors
    Bhuyan, LN
    Wang, HJ
    IEEE TRANSACTIONS ON COMPUTERS, 2003, 52 (05) : 617 - 632
  • [50] Load balancing for parallel query execution on NUMA multiprocessors
    INRIA Rocquencourt, France
    Distrib Parallel Databases, 1 (99-121):