Modeling Stencil Computations on Modern HPC Architectures

被引:6
|
作者
de la Cruz, Raul [1 ]
Araya-Polo, Mauricio [2 ]
机构
[1] Barcelona Supercomp Ctr, CASE Dept, Barcelona, Spain
[2] Shell Int Explorat & Prod Inc, Houston, TX USA
关键词
Stencil computation; FD; Modeling; HPC; Prefetching; Spatial blocking; Semi-stencil; Multi-core; Intel Xeon Phi; PERFORMANCE-MODEL;
D O I
10.1007/978-3-319-17248-4_8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stencil computations are widely used for solving Partial Differential Equations (PDEs) explicitly by Finite Difference schemes. The stencil solver alone -depending on the governing equation-can represent up to 90% of the overall elapsed time, of which moving data back and forth from memory to CPU is a major concern. Therefore, the development and analysis of source code modifications that can effectively use the memory hierarchy of modern architectures is crucial. Performance models help expose bottlenecks and predict suitable tuning parameters in order to boost stencil performance on any given platform. To achieve that, the following two considerations need to be accurately modeled: first, modern architectures, such as Intel Xeon Phi, sport multi-or manycore processors with shared multi-level caches featuring one or several prefetching engines. Second, algorithmic optimizations, such as spatial blocking or Semi-stencil, have complex behaviors that follow the intricacy of the above described modern architectures. In this work, a previously published performance model is extended to effectively capture these architectural and algorithmic characteristics. The extended model results show an accuracy error ranging from 5-15 %.
引用
收藏
页码:149 / 171
页数:23
相关论文
共 50 条
  • [1] Optimization and Performance Modeling of Stencil Computations on ARM Architectures
    Zhang, Kaifang
    Su, Huayou
    Zhang, Peng
    Dou, Yong
    [J]. Proceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020, 2020, : 113 - 121
  • [2] Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
    Datta, Kaushik
    Kamil, Shoaib
    Williams, Samuel
    Oliker, Leonid
    Shalf, John
    Yelick, Katherine
    [J]. SIAM REVIEW, 2009, 51 (01) : 129 - 159
  • [3] Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures
    Christen, Matthias
    Schenk, Olaf
    Neufeld, Esra
    Messmer, Peter
    Burkhart, Helmar
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 547 - +
  • [4] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
    Zhang, Kaifang
    Su, Huayou
    Dou, Yong
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13584 - 13600
  • [5] Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations
    Szustak, Lukasz
    Halbiniak, Kamil
    Wyrzykowski, Roman
    Jakl, Ondrej
    [J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (12): : 7765 - 7777
  • [6] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
    Kaifang Zhang
    Huayou Su
    Yong Dou
    [J]. The Journal of Supercomputing, 2021, 77 : 13584 - 13600
  • [7] Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations
    Lukasz Szustak
    Kamil Halbiniak
    Roman Wyrzykowski
    Ondřej Jakl
    [J]. The Journal of Supercomputing, 2019, 75 : 7765 - 7777
  • [8] Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow
    Jaro Hokkanen
    Stefan Kollet
    Jiri Kraus
    Andreas Herten
    Markus Hrywniak
    Dirk Pleiter
    [J]. Computational Geosciences, 2021, 25 : 1579 - 1590
  • [9] Leveraging HPC accelerator architectures with modern techniques - hydrologic modeling on GPUs with ParFlow
    Hokkanen, Jaro
    Kollet, Stefan
    Kraus, Jiri
    Herten, Andreas
    Hrywniak, Markus
    Pleiter, Dirk
    [J]. COMPUTATIONAL GEOSCIENCES, 2021, 25 (05) : 1579 - 1590
  • [10] Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures
    Lin, Pei-Hung
    Yi, Qing
    Quinlan, Daniel
    Liao, Chunhua
    Yan, Yongqing
    [J]. LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2016, 2017, 10136 : 137 - 152