ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures

被引：43

作者：

Broquedis, Francois ^{[1
]}

Furmento, Nathalie ^{[1
]}

Goglin, Brice ^{[1
]}

Wacrenier, Pierre-Andre ^{[1
]}

Namyst, Raymond ^{[1
]}

机构：

[1] Univ Bordeaux, LaBRI, INRIA Bordeaux Sud Ouest, F-33405 Talence, France

来源：

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING | 2010年 / 38卷 / 5-6期

关键词：

OpenMP; Memory; NUMA; Hierarchical Thread Scheduling; Multi-Core; PERFORMANCE;

D O I：

10.1007/s10766-010-0136-3

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture so as to avoid remote memory access penalties. Directive-based programming languages such as OpenMP, can greatly help to perform such a distribution by providing programmers with an easy way to structure the parallelism of their application and to transmit this information to the runtime system. Our runtime, which is based on a multi-level thread scheduler combined with a NUMA-aware memory manager, converts this information into scheduling hints related to thread-memory affinity issues. These hints enable dynamic load distribution guided by application structure and hardware topology, thus helping to achieve performance portability. Several experiments show that mixed solutions (migrating both threads and data) outperform work-stealing based balancing strategies and next-touch-based data distribution policies. These techniques provide insights about additional optimizations.

引用

页码：418 / 439

页数：22

共 50 条

[1] ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures
François Broquedis
Nathalie Furmento
Brice Goglin
Pierre-André Wacrenier
Raymond Namyst
[J]. International Journal of Parallel Programming, 2010, 38 : 418 - 439
[2] Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
Clet-Ortega, Jerome
Carribault, Patrick
Perache, Marc
[J]. EURO-PAR 2014 PARALLEL PROCESSING, 2014, 8632 : 596 - 607
[3] OpenMP and NUMA Architectures I: Investigating memory placement on the SGI origin 3000
Robertson, N
Rendell, A
[J]. COMPUTATIONAL SCIENCE - ICCS 2003, PT IV, PROCEEDINGS, 2003, 2660 : 648 - 656
[4] Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
Broquedis, Francois
Furmento, Nathalie
Goglin, Brice
Namyst, Raymond
Wacrenier, Pierre-Andre
[J]. EVOLVING OPENMP IN AN AGE OF EXTREME PARALLELISM, 2009, 5568 : 79 - +
[5] SIMT/OMP: A toolset to study and exploit memory locality of OpenMP applications on NUMA architectures
Tao, J
Schulz, M
Karl, W
[J]. SHARED MEMORY PARALLEL PROGRAMMING WITH OPENMP, 2005, 3349 : 41 - 52
[6] An efficient OpenMP runtime system for hierarchical architectures
Thibault, Samuel
Broquedis, Francois
Goglin, Brice
Namyst, Raymond
Wacrenier, Pierre-Andre
[J]. PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 161 - 172
[7] Program development environment for OpenMP programs on ccNUMA architectures
Chapman, B
Hernandez, O
Patil, A
Prabhakar, A
[J]. LARGE-SCALE SCIENTIFIC COMPUTING, 2001, 2179 : 210 - 217
[8] OpenMP on multicore architectures
Terboven, Christian
Mey, Dieter an
Sarholz, Samuel
[J]. PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 54 - 64
[9] On the performance of BWA on NUMA architectures
Lenis, Josefina
Senar, Miquel Angel
[J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 236 - 241
[10] OpenMP task scheduling strategies for multicore NUMA systems
Olivier, Stephen L.
Porterfield, Allan K.
Wheeler, Kyle B.
Spiegel, Michael
Prins, Jan F.
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2012, 26 (02): : 110 - 124

← 1 2 3 4 5 →