Finite element assembly strategies on multi-core and many-core architectures

被引：72

作者：

Markall, G. R. ^{[2
]}

Slemmer, A. ^{[2
]}

Ham, D. A. ^{[3
,4
]}

Kelly, P. H. J. ^{[2
]}

Cantwell, C. D. ^{[1
]}

Sherwin, S. J. ^{[1
]}

机构：

[1] Univ London Imperial Coll Sci Technol & Med, Dept Aeronaut, London, England

[2] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London, England

[3] Univ London Imperial Coll Sci Technol & Med, Grantham Inst Climate Change, London, England

[4] Univ London Imperial Coll Sci Technol & Med, Dept Earth Sci & Engn, London, England

来源：

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS | 2013年 / 71卷 / 01期

基金：

英国工程与自然科学研究理事会;

关键词：

FEM; GPU; multi-core; many-core; GPUS; SOLVERS;

D O I：

10.1002/fld.3648

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We demonstrate that radically differing implementations of finite element methods (FEMs) are needed on multi-core (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our numerical investigations using a finite element advectiondiffusion solver show that increased performance on each architecture can only be achieved by committing to specific and diverse algorithmic choices that cut across the high-level structure of the implementation. Making these commitments to achieve high performance for a single architecture leads to a loss of performance portability. Data structures that include redundant data but enable coalesced memory accesses are faster on many-core architectures, whereas redundancy-free data structures that are accessed indirectly are faster on multi-core architectures. The Addto algorithm for global assembly is optimal on multi-core architectures, whereas the Local Matrix Approach is optimal on many-core architectures despite requiring more computation than the Addto algorithm. These results demonstrate the value in making the correct choice of algorithm and data structure when implementing FEMs, spectral element methods and low-order discontinuous Galerkin methods on modern high-performance architectures. Copyright (c) 2012 John Wiley & Sons, Ltd.

引用

页码：80 / 97

页数：18

共 50 条

[1] Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures
Jose M. Badia
Adrian Amor-Martin
Jose A. Belloch
Luis Emilio Garcia-Castillo
[J]. The Journal of Supercomputing, 2023, 79 : 7648 - 7664
[2] Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures
Badia, Jose M.
Amor-Martin, Adrian
Belloch, Jose A.
Garcia-Castillo, Luis Emilio
[J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (07): : 7648 - 7664
[3] Revision of Relational Joins for Multi-Core and Many-Core Architectures
Krulis, Martin
Yaghob, Jakub
[J]. DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2011, 706 : 229 - 240
[4] Solving Matrix Equations on Multi-Core and Many-Core Architectures
Benner, Peter
Ezzatti, Pablo
Mena, Hermann
Quintana-Orti, Enrique S.
Remon, Alfredo
[J]. ALGORITHMS, 2013, 6 (04) : 857 - 870
[5] RTL Test Generation on Multi-Core and Many-Core Architectures
Varadarajan, Aravind Krishnan
Hsiao, Michael S.
[J]. 2019 32ND INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2019 18TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2019, : 100 - 105
[6] Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures
Gan, Lin
Fu, Haohuan
Xue, Wei
Xu, Yangtong
Yang, Chao
Wang, Xinliang
Lv, Zihong
You, Yang
Yang, Guangwen
Ou, Kaijian
[J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 103 - 110
[7] Parallel Subspace Clustering Using Multi-core and Many-core Architectures
Datta, Amitava
Kaur, Amardeep
Lauer, Tobias
Chabbouh, Sami
[J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 213 - 223
[8] A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
Rokos, Georgios
Gorman, Gerard
Kelly, Paul H. J.
[J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 414 - 425
[9] Portability with efficiency of the advection of BRAMS between multi-core and many-core architectures
Silva Junior, Manoel Baptista
Panetta, Jairo
Stephany, Stephan
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (22):
[10] Portability with efficiency of the advection of BRAMS between multi-core and many-core architectures
Center for Weather Forecasting and Climate Research , National Institute for Space Research , Cachoeira Paulista, Brazil
不详
不详
[J]. Concurr. Comput. Pract. Exper., 1600, 22

← 1 2 3 4 5 →