Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures

被引:3
|
作者
Ono, Kenji [1 ]
Kato, Toshihiro [2 ]
Ohshima, Satoshi [3 ]
Nanri, Takeshi [1 ]
机构
[1] Kyushu Univ, Fukuoka, Japan
[2] NEC Corp Ltd, Tokyo, Japan
[3] Nagoya Univ, Nagoya, Aichi, Japan
关键词
parallel cyclic reduction; cache bandwidth; line successive over-relaxation;
D O I
10.1145/3368474.3368484
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.
引用
下载
收藏
页码:11 / 21
页数:11
相关论文
共 50 条
  • [1] A Hybrid Parallel Tridiagonal Solver on Multi-core Architectures
    Tang, Guangping
    Li, Kenli
    Li, Keqin
    Chen, Hang
    Du, Jiayi
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 605 - 614
  • [2] Implementation of Hybrid Total FETI (HTFETI) Solver for Multi-core Architectures
    Riha, Lubomir
    Brzobohaty, Tomas
    Markopoulos, Alexandros
    Jarosova, Marta
    Kozubek, Tomas
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
  • [3] Sparse Matrix Operations on Multi-core Architectures
    Trinitis, Carsten
    Kuestner, Tilman
    Weidendorfer, Josef
    Smajic, Jasmin
    PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2009, 5698 : 41 - +
  • [4] Fast and Scalable Thread Migration for Multi-Core Architectures
    Rodrigues, Miguel
    Roma, Nuno
    Tomas, Pedro
    PROCEEDINGS IEEE/IFIP 13TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING 2015, 2015, : 9 - 16
  • [5] Sparse matrix operations on several multi-core architectures
    Carsten Trinitis
    Tilman Küstner
    Josef Weidendorfer
    Jasmin Smajic
    The Journal of Supercomputing, 2011, 57 : 132 - 140
  • [6] Sparse matrix operations on several multi-core architectures
    Trinitis, Carsten
    Kuestner, Tilman
    Weidendorfer, Josef
    Smajic, Jasmin
    JOURNAL OF SUPERCOMPUTING, 2011, 57 (02): : 132 - 140
  • [7] Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures
    Oryspayev, Dossay
    Aktulga, Hasan Metin
    Sosonkina, Masha
    Maris, Pieter
    Vary, James P.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17): : 5019 - 5036
  • [8] Scalable Compile-Time Scheduler for Multi-core Architectures
    Pelcat, Maxime
    Menuet, Pierrick
    Aridhi, Slaheddine
    Nezan, Jean-Francois
    DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 1552 - +
  • [9] On-Chip Photonic Interconnects for Scalable Multi-core Architectures
    Kodi, Avinash Karanth
    Morris, Randy
    Louri, Ahmed
    Zhang, Xiang
    2009 3RD ACM/IEEE INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP, 2009, : 90 - 90
  • [10] A multi-level direct-iterative solver for seismic wave propagation modelling: space and wavelet approaches
    Hustedt, B
    Operto, S
    Virieux, J
    GEOPHYSICAL JOURNAL INTERNATIONAL, 2003, 155 (03) : 953 - 980