Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures

被引:3
|
作者
Ono, Kenji [1 ]
Kato, Toshihiro [2 ]
Ohshima, Satoshi [3 ]
Nanri, Takeshi [1 ]
机构
[1] Kyushu Univ, Fukuoka, Japan
[2] NEC Corp Ltd, Tokyo, Japan
[3] Nagoya Univ, Nagoya, Aichi, Japan
关键词
parallel cyclic reduction; cache bandwidth; line successive over-relaxation;
D O I
10.1145/3368474.3368484
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.
引用
收藏
页码:11 / 21
页数:11
相关论文
共 50 条
  • [31] Accurate, Scalable and Informative Design Space Exploration for Large and Sophisticated Multi-core Oriented Architectures
    Cho, Chang-Burm
    Poe, James
    Li, Tao
    Yuan, Jingling
    2009 IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2009, : 16 - +
  • [32] Modelling Short-range Quantum Teleportation for Scalable Multi-Core Quantum Computing Architectures
    Rodrigo, Santiago
    Abadal, Sergi
    Almudever, Carmen G.
    Alarcon, Eduard
    PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON NANOSCALE COMPUTING AND COMMUNICATION (ACM NANOCOM 2021), 2021,
  • [33] Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs
    Krotkiewski, M.
    Dabrowski, M.
    PARALLEL COMPUTING, 2010, 36 (04) : 181 - 198
  • [34] Performance Limitations for Sparse Matrix-Vector Multiplications on Current Multi-Core Environments
    Schubert, Gerald
    Hager, Georg
    Fehske, Holger
    HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING, GARCHING/MUNICH 2009: TRANSACTIONS OF THE FOURTH JOINT HLRB AND KONWIHR REVIEW AND RESULTS WORKSHOP, 2010, : 13 - +
  • [35] Trading Off Area, Yield and Performance via Hybrid Redundancy in Multi-Core Architectures
    Gao, Yue
    Zhang, Yang
    Cheng, Da
    Breuer, Melvin A.
    2013 IEEE 31ST VLSI TEST SYMPOSIUM (VTS), 2013,
  • [36] An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures
    Tang, Guangping
    Yang, Wangdong
    Li, Kenli
    Ye, Yu
    Xiao, Guoqing
    Li, Keqin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17): : 5076 - 5095
  • [37] Scalable Multi-Core Dual-Polarization Coherent Receiver Using a Metasurface Optical Hybrid
    Komatsu, Kento
    Soma, Go
    Ishimura, Shota
    Takahashi, Hidenori
    Tsuritani, Takehiro
    Suzuki, Masatoshi
    Nakano, Yoshiaki
    Tanemura, Takuo
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2024, 42 (11) : 4013 - 4022
  • [38] An Effective Way of Storing and Accessing Very Large Transition Matrices Using Multi-core CPU and GPU Architectures
    Wieczorek, Bozena
    Polomski, Marcin
    Pecka, Piotr
    Deorowicz, Sebastian
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2014, 2014, 424 : 323 - 334
  • [39] Hardware–software optimizations of reconfigurable multi-core processors for floating-point computations of large sparse matrices
    Xiaofang Wang
    Journal of Real-Time Image Processing, 2014, 9 : 187 - 204
  • [40] Reconfigurable Homogenous Multi-Core FFT Processor Architectures for Hybrid SISO/MIMO OFDM Wireless Communications
    Wey, Chin-Long
    Lin, Shin-Yo
    Tsai, Pei-Yun
    Shieh, Ming-Der
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2011, E94A (07) : 1530 - 1539