Efficient Parallel Multigrid Method on Intel Xeon Phi Clusters

被引:0
|
作者
Nakajima, Kengo [1 ]
Gerofi, Balazs [2 ]
Ishikawa, Yutaka [2 ]
Horikoshi, Masashi [3 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] RIKEN, R CCS, Kobe, Hyogo, Japan
[3] Intel Corp, Tokyo, Japan
来源
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION WORKSHOPS (HPC ASIA 2021 WORKSHOPS) | 2020年
关键词
parallel iterative solvers; multigrid; SELL-C-sigma; light weight kernel;
D O I
10.1145/3440722.3440882
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-FVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-sigma to the MGCG solver, and evaluated the performance of the solver with various types of OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OFP) system at JCAHPC using up to 1,024 nodes of Intel Xeon Phi. Because SELL-C-sigma is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 20%. This is one of the first examples of SELL-C-sigma applied to forward/backward substitutions in ILU-type smoother of multigrid solver. Furthermore, effects of IHK/McKernel has been investigated, and it achieved 11% improvement on 1,024 nodes.
引用
收藏
页码:46 / 49
页数:4
相关论文
共 50 条
  • [31] Fast Epistasis Detection in Large-Scale GWAS for Intel Xeon Phi Clusters
    Luecke, Glenn R.
    Weeks, Nathan T.
    Groth, Brandon M.
    Kraeva, Marina
    Ma, Li
    Kramer, Luke M.
    Koltes, James E.
    Reecy, James M.
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 228 - 235
  • [32] An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors
    Takahashi, Daisuke
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2017, PT I, 2017, 10404 : 401 - 410
  • [33] Parallel Independent FFT Implementation on Intel Processors and Xeon Phi for LTE and OFDM Systems
    Khelifi, Mounir
    Massicotte, Daniel
    Savaria, Yvon
    2015 NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS) - NORCHIP & INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP (SOC), 2015,
  • [34] Optimized Parallel Label Propagation based Community Detection on the Intel® Xeon Phi™ Architecture
    Khlopotine, Andrei B.
    Sathanur, Arun V.
    Jandhyala, Vikram
    2015 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2015, : 9 - 16
  • [35] High-Frequency Financial Statistics with Parallel R and Intel Xeon Phi Coprocessor
    Zou, Jian
    Zhang, Hui
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [36] Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
    Czarnul, Pawel
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (05) : 1091 - 1107
  • [37] Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors
    Paweł Czarnul
    International Journal of Parallel Programming, 2017, 45 : 1091 - 1107
  • [38] Efficient Sparse Matrix-matrix Multiplication for Computing Periodic Responses by Shooting Method on Intel Xeon Phi
    Stoykov, S.
    Atanassov, E.
    Margenov, S.
    APPLICATION OF MATHEMATICS IN TECHNICAL AND NATURAL SCIENCES (AMITANS'16), 2016, 1773
  • [39] An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel® Xeon Phi™ processor
    Mironov, Vladimir
    Alexeev, Yuri
    Keipert, Kristopher
    D'mello, Michael
    Moskovsky, Alexander
    Gordon, Mark S.
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,
  • [40] Performance Evaluation of R with Intel Xeon Phi Coprocessor
    El-Khamra, Yaakoub
    Gaffney, Niall
    Walling, David
    Wernert, Eric
    Xu, Weijia
    Zhang, Hui
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,