Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures

被引：3

作者：

Ono, Kenji ^{[1
]}

Kato, Toshihiro ^{[2
]}

Ohshima, Satoshi ^{[3
]}

Nanri, Takeshi ^{[1
]}

机构：

[1] Kyushu Univ, Fukuoka, Japan

[2] NEC Corp Ltd, Tokyo, Japan

[3] Nagoya Univ, Nagoya, Aichi, Japan

来源：

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2020) | 2020年

关键词：

parallel cyclic reduction; cache bandwidth; line successive over-relaxation;

D O I：

10.1145/3368474.3368484

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.

引用

下载

页码：11 / 21

页数：11

共 50 条

[1] A Hybrid Parallel Tridiagonal Solver on Multi-core Architectures
Tang, Guangping
Li, Kenli
Li, Keqin
Chen, Hang
Du, Jiayi
PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 605 - 614
[2] Implementation of Hybrid Total FETI (HTFETI) Solver for Multi-core Architectures
Riha, Lubomir
Brzobohaty, Tomas
Markopoulos, Alexandros
Jarosova, Marta
Kozubek, Tomas
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
[3] Sparse Matrix Operations on Multi-core Architectures
Trinitis, Carsten
Kuestner, Tilman
Weidendorfer, Josef
Smajic, Jasmin
PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2009, 5698 : 41 - +
[4] Fast and Scalable Thread Migration for Multi-Core Architectures
Rodrigues, Miguel
Roma, Nuno
Tomas, Pedro
PROCEEDINGS IEEE/IFIP 13TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING 2015, 2015, : 9 - 16
[5] Sparse matrix operations on several multi-core architectures
Carsten Trinitis
Tilman Küstner
Josef Weidendorfer
Jasmin Smajic
The Journal of Supercomputing, 2011, 57 : 132 - 140
[6] Sparse matrix operations on several multi-core architectures
Trinitis, Carsten
Kuestner, Tilman
Weidendorfer, Josef
Smajic, Jasmin
JOURNAL OF SUPERCOMPUTING, 2011, 57 (02): : 132 - 140
[7] Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures
Oryspayev, Dossay
Aktulga, Hasan Metin
Sosonkina, Masha
Maris, Pieter
Vary, James P.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17): : 5019 - 5036
[8] Scalable Compile-Time Scheduler for Multi-core Architectures
Pelcat, Maxime
Menuet, Pierrick
Aridhi, Slaheddine
Nezan, Jean-Francois
DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 1552 - +
[9] On-Chip Photonic Interconnects for Scalable Multi-core Architectures
Kodi, Avinash Karanth
Morris, Randy
Louri, Ahmed
Zhang, Xiang
2009 3RD ACM/IEEE INTERNATIONAL SYMPOSIUM ON NETWORKS-ON-CHIP, 2009, : 90 - 90
[10] A multi-level direct-iterative solver for seismic wave propagation modelling: space and wavelet approaches
Hustedt, B
Operto, S
Virieux, J
GEOPHYSICAL JOURNAL INTERNATIONAL, 2003, 155 (03) : 953 - 980

← 1 2 3 4 5 →