Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures

被引：3

作者：

Ono, Kenji ^{[1
]}

Kato, Toshihiro ^{[2
]}

Ohshima, Satoshi ^{[3
]}

Nanri, Takeshi ^{[1
]}

机构：

[1] Kyushu Univ, Fukuoka, Japan

[2] NEC Corp Ltd, Tokyo, Japan

[3] Nagoya Univ, Nagoya, Aichi, Japan

来源：

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2020) | 2020年

关键词：

parallel cyclic reduction; cache bandwidth; line successive over-relaxation;

D O I：

10.1145/3368474.3368484

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.

引用

页码：11 / 21

页数：11

共 50 条

[31] Accurate, Scalable and Informative Design Space Exploration for Large and Sophisticated Multi-core Oriented Architectures
Cho, Chang-Burm
Poe, James
Li, Tao
Yuan, Jingling
2009 IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2009, : 16 - +
[32] Modelling Short-range Quantum Teleportation for Scalable Multi-Core Quantum Computing Architectures
Rodrigo, Santiago
Abadal, Sergi
Almudever, Carmen G.
Alarcon, Eduard
PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON NANOSCALE COMPUTING AND COMMUNICATION (ACM NANOCOM 2021), 2021,
[33] Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs
Krotkiewski, M.
Dabrowski, M.
PARALLEL COMPUTING, 2010, 36 (04) : 181 - 198
[34] Performance Limitations for Sparse Matrix-Vector Multiplications on Current Multi-Core Environments
Schubert, Gerald
Hager, Georg
Fehske, Holger
HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING, GARCHING/MUNICH 2009: TRANSACTIONS OF THE FOURTH JOINT HLRB AND KONWIHR REVIEW AND RESULTS WORKSHOP, 2010, : 13 - +
[35] Trading Off Area, Yield and Performance via Hybrid Redundancy in Multi-Core Architectures
Gao, Yue
Zhang, Yang
Cheng, Da
Breuer, Melvin A.
2013 IEEE 31ST VLSI TEST SYMPOSIUM (VTS), 2013,
[36] An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures
Tang, Guangping
Yang, Wangdong
Li, Kenli
Ye, Yu
Xiao, Guoqing
Li, Keqin
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17): : 5076 - 5095
[37] Scalable Multi-Core Dual-Polarization Coherent Receiver Using a Metasurface Optical Hybrid
Komatsu, Kento
Soma, Go
Ishimura, Shota
Takahashi, Hidenori
Tsuritani, Takehiro
Suzuki, Masatoshi
Nakano, Yoshiaki
Tanemura, Takuo
JOURNAL OF LIGHTWAVE TECHNOLOGY, 2024, 42 (11) : 4013 - 4022
[38] An Effective Way of Storing and Accessing Very Large Transition Matrices Using Multi-core CPU and GPU Architectures
Wieczorek, Bozena
Polomski, Marcin
Pecka, Piotr
Deorowicz, Sebastian
BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2014, 2014, 424 : 323 - 334
[39] Hardware–software optimizations of reconfigurable multi-core processors for floating-point computations of large sparse matrices
Xiaofang Wang
Journal of Real-Time Image Processing, 2014, 9 : 187 - 204
[40] Reconfigurable Homogenous Multi-Core FFT Processor Architectures for Hybrid SISO/MIMO OFDM Wireless Communications
Wey, Chin-Long
Lin, Shin-Yo
Tsai, Pei-Yun
Shieh, Ming-Der
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2011, E94A (07) : 1530 - 1539

← 1 2 3 4 5 →