Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations

被引：6

作者：

Aliaga, Jose, I ^{[1
]}

Dufrechou, Ernesto ^{[2
]}

Ezzatti, Pablo ^{[2
]}

Quintana-Orti, Enrique S. ^{[1
]}

机构：

[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana, Spain

[2] Univ Republica, Inst Comp, Montevideo, Uruguay

来源：

PARALLEL COMPUTING | 2019年 / 85卷

关键词：

Sparse linear systems; Iterative Krylov-subspace methods; Data parallelism; ILUPACK preconditioner; Graphics processing units (GPUs); PRECONDITIONERS;

D O I：

10.1016/j.parco.2019.02.005

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov subspace-based methods. Its relevance for the solution of real problems has motivated several efforts to enhance its performance on parallel machines. In this work we focus on exploiting the task-level parallelism derived from the structure of the BiCG method, in addition to the data-level parallelism of the internal matrix computations, with the goal of boosting the performance of a GPU (graphics processing unit) implementation of this solver. First, we revisit the use of dual-GPU systems to execute independent stages of the BiCG concurrently on both accelerators, while leveraging the extra memory space to improve the data access patterns. In addition, we extend our ideas to compute the BiCG method efficiently in multicore platforms with a single GPU. In this line, we study the possibilities offered by hybrid CPU-GPU computations, as well as a novel synchronization-free sparse triangular linear solver. The experimental results with the new solvers show important acceleration factors with respect to the previous data-parallel CPU and GPU versions. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：79 / 87

页数：9

共 12 条

[1] Accelerating hybrid TDPO and TDEEC by Multi-GPU and Multi-CPU Cooperation
Zhao, Wei
Xu, Le
Li, Rui
Shi, Xiaowei
2016 IEEE MTT-S INTERNATIONAL CONFERENCE ON NUMERICAL ELECTROMAGNETIC AND MULTIPHYSICS MODELING AND OPTIMIZATION (NEMO), 2016,
[2] Accelerating hyper-spectral data processing on the multi-CPU and multi-GPU heterogeneous computing platform
Zhang, Lei
Gao, Jiao Bo
Hu, Yu
Wang, Ying Hui
Sun, Ke Feng
Cheng, Juan
Sun, Dan
Li, Yu
SECOND INTERNATIONAL CONFERENCE ON PHOTONICS AND OPTICAL ENGINEERING, 2017, 10256
[3] Using MATLAB's Parallel Processing Toolbox for Multi-CPU and Multi-GPU Accelerated FDTD Simulations
Weiss, Alec J.
Elsherbeni, Atef Z.
Demir, Veysel
Hadi, Mohammed F.
APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2019, 34 (05): : 724 - 730
[4] Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference
Xiang, Yecheng
Kim, Hyoseung
2019 IEEE 40TH REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2019), 2019, : 392 - 405
[5] Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems
Lee, Janghaeng
Samadi, Mehrzad
Park, Yongjun
Mahlke, Scott
2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 245 - 255
[6] Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments
Trong-Tuan Vu
Derbel, Bilel
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 56 : 95 - 109
[7] Time Performance Analysis of Multi-CPU and Multi-GPU in Big Data Clustering Computation
Adiyoso, Widiarto
Krisnadhi, Adila
Wibisono, Ari
Purbarani, Sumarsih Condroayu
Saraswati, Anindhita Dwi
Putri, Annissa Fildzah Rafi
Saladdin, Ibad Rahadian
Anwar, S. Reyneta Carissa
2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2018, : 113 - 116
[8] Efficient CPU-GPU Work Sharing for Data-Parallel Java']JavaScript Workloads
Piao, Xianglan
Kim, Channoh
Oh, Younghwan
Kim, Hanjun
Lee, Jae W.
WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 357 - 358
[9] Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications
Zhong, Ziming
Rychkov, Vladimir
Lastovetsky, Alexey
2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2012, : 191 - 199
[10] Accelerating Yade's poromechanical coupling with matrix factorization reuse, parallel task management, and GPU computing
Caulk, Robert A.
Catalano, Emanuele
Chareyre, Bruno
COMPUTER PHYSICS COMMUNICATIONS, 2020, 248

← 1 2 →