ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures

被引:43
|
作者
Broquedis, Francois [1 ]
Furmento, Nathalie [1 ]
Goglin, Brice [1 ]
Wacrenier, Pierre-Andre [1 ]
Namyst, Raymond [1 ]
机构
[1] Univ Bordeaux, LaBRI, INRIA Bordeaux Sud Ouest, F-33405 Talence, France
关键词
OpenMP; Memory; NUMA; Hierarchical Thread Scheduling; Multi-Core; PERFORMANCE;
D O I
10.1007/s10766-010-0136-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture so as to avoid remote memory access penalties. Directive-based programming languages such as OpenMP, can greatly help to perform such a distribution by providing programmers with an easy way to structure the parallelism of their application and to transmit this information to the runtime system. Our runtime, which is based on a multi-level thread scheduler combined with a NUMA-aware memory manager, converts this information into scheduling hints related to thread-memory affinity issues. These hints enable dynamic load distribution guided by application structure and hardware topology, thus helping to achieve performance portability. Several experiments show that mixed solutions (migrating both threads and data) outperform work-stealing based balancing strategies and next-touch-based data distribution policies. These techniques provide insights about additional optimizations.
引用
收藏
页码:418 / 439
页数:22
相关论文
共 50 条
  • [1] ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures
    François Broquedis
    Nathalie Furmento
    Brice Goglin
    Pierre-André Wacrenier
    Raymond Namyst
    [J]. International Journal of Parallel Programming, 2010, 38 : 418 - 439
  • [2] Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
    Clet-Ortega, Jerome
    Carribault, Patrick
    Perache, Marc
    [J]. EURO-PAR 2014 PARALLEL PROCESSING, 2014, 8632 : 596 - 607
  • [3] OpenMP and NUMA Architectures I: Investigating memory placement on the SGI origin 3000
    Robertson, N
    Rendell, A
    [J]. COMPUTATIONAL SCIENCE - ICCS 2003, PT IV, PROCEEDINGS, 2003, 2660 : 648 - 656
  • [4] Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
    Broquedis, Francois
    Furmento, Nathalie
    Goglin, Brice
    Namyst, Raymond
    Wacrenier, Pierre-Andre
    [J]. EVOLVING OPENMP IN AN AGE OF EXTREME PARALLELISM, 2009, 5568 : 79 - +
  • [5] SIMT/OMP: A toolset to study and exploit memory locality of OpenMP applications on NUMA architectures
    Tao, J
    Schulz, M
    Karl, W
    [J]. SHARED MEMORY PARALLEL PROGRAMMING WITH OPENMP, 2005, 3349 : 41 - 52
  • [6] An efficient OpenMP runtime system for hierarchical architectures
    Thibault, Samuel
    Broquedis, Francois
    Goglin, Brice
    Namyst, Raymond
    Wacrenier, Pierre-Andre
    [J]. PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 161 - 172
  • [7] Program development environment for OpenMP programs on ccNUMA architectures
    Chapman, B
    Hernandez, O
    Patil, A
    Prabhakar, A
    [J]. LARGE-SCALE SCIENTIFIC COMPUTING, 2001, 2179 : 210 - 217
  • [8] OpenMP on multicore architectures
    Terboven, Christian
    Mey, Dieter an
    Sarholz, Samuel
    [J]. PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 54 - 64
  • [9] On the performance of BWA on NUMA architectures
    Lenis, Josefina
    Senar, Miquel Angel
    [J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 236 - 241
  • [10] OpenMP task scheduling strategies for multicore NUMA systems
    Olivier, Stephen L.
    Porterfield, Allan K.
    Wheeler, Kyle B.
    Spiegel, Michael
    Prins, Jan F.
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2012, 26 (02): : 110 - 124