Finite element assembly strategies on multi-core and many-core architectures

被引:72
|
作者
Markall, G. R. [2 ]
Slemmer, A. [2 ]
Ham, D. A. [3 ,4 ]
Kelly, P. H. J. [2 ]
Cantwell, C. D. [1 ]
Sherwin, S. J. [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Aeronaut, London, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London, England
[3] Univ London Imperial Coll Sci Technol & Med, Grantham Inst Climate Change, London, England
[4] Univ London Imperial Coll Sci Technol & Med, Dept Earth Sci & Engn, London, England
基金
英国工程与自然科学研究理事会;
关键词
FEM; GPU; multi-core; many-core; GPUS; SOLVERS;
D O I
10.1002/fld.3648
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We demonstrate that radically differing implementations of finite element methods (FEMs) are needed on multi-core (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our numerical investigations using a finite element advectiondiffusion solver show that increased performance on each architecture can only be achieved by committing to specific and diverse algorithmic choices that cut across the high-level structure of the implementation. Making these commitments to achieve high performance for a single architecture leads to a loss of performance portability. Data structures that include redundant data but enable coalesced memory accesses are faster on many-core architectures, whereas redundancy-free data structures that are accessed indirectly are faster on multi-core architectures. The Addto algorithm for global assembly is optimal on multi-core architectures, whereas the Local Matrix Approach is optimal on many-core architectures despite requiring more computation than the Addto algorithm. These results demonstrate the value in making the correct choice of algorithm and data structure when implementing FEMs, spectral element methods and low-order discontinuous Galerkin methods on modern high-performance architectures. Copyright (c) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:80 / 97
页数:18
相关论文
共 50 条
  • [1] Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures
    Jose M. Badia
    Adrian Amor-Martin
    Jose A. Belloch
    Luis Emilio Garcia-Castillo
    [J]. The Journal of Supercomputing, 2023, 79 : 7648 - 7664
  • [2] Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures
    Badia, Jose M.
    Amor-Martin, Adrian
    Belloch, Jose A.
    Garcia-Castillo, Luis Emilio
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (07): : 7648 - 7664
  • [3] Revision of Relational Joins for Multi-Core and Many-Core Architectures
    Krulis, Martin
    Yaghob, Jakub
    [J]. DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2011, 706 : 229 - 240
  • [4] Solving Matrix Equations on Multi-Core and Many-Core Architectures
    Benner, Peter
    Ezzatti, Pablo
    Mena, Hermann
    Quintana-Orti, Enrique S.
    Remon, Alfredo
    [J]. ALGORITHMS, 2013, 6 (04) : 857 - 870
  • [5] RTL Test Generation on Multi-Core and Many-Core Architectures
    Varadarajan, Aravind Krishnan
    Hsiao, Michael S.
    [J]. 2019 32ND INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2019 18TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2019, : 100 - 105
  • [6] Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures
    Gan, Lin
    Fu, Haohuan
    Xue, Wei
    Xu, Yangtong
    Yang, Chao
    Wang, Xinliang
    Lv, Zihong
    You, Yang
    Yang, Guangwen
    Ou, Kaijian
    [J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 103 - 110
  • [7] Parallel Subspace Clustering Using Multi-core and Many-core Architectures
    Datta, Amitava
    Kaur, Amardeep
    Lauer, Tobias
    Chabbouh, Sami
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 213 - 223
  • [8] A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
    Rokos, Georgios
    Gorman, Gerard
    Kelly, Paul H. J.
    [J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 414 - 425
  • [9] Portability with efficiency of the advection of BRAMS between multi-core and many-core architectures
    Silva Junior, Manoel Baptista
    Panetta, Jairo
    Stephany, Stephan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (22):
  • [10] Portability with efficiency of the advection of BRAMS between multi-core and many-core architectures
    Center for Weather Forecasting and Climate Research , National Institute for Space Research , Cachoeira Paulista, Brazil
    不详
    不详
    [J]. Concurr. Comput. Pract. Exper., 1600, 22