Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations

被引:6
|
作者
Aliaga, Jose, I [1 ]
Dufrechou, Ernesto [2 ]
Ezzatti, Pablo [2 ]
Quintana-Orti, Enrique S. [1 ]
机构
[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana, Spain
[2] Univ Republica, Inst Comp, Montevideo, Uruguay
关键词
Sparse linear systems; Iterative Krylov-subspace methods; Data parallelism; ILUPACK preconditioner; Graphics processing units (GPUs); PRECONDITIONERS;
D O I
10.1016/j.parco.2019.02.005
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov subspace-based methods. Its relevance for the solution of real problems has motivated several efforts to enhance its performance on parallel machines. In this work we focus on exploiting the task-level parallelism derived from the structure of the BiCG method, in addition to the data-level parallelism of the internal matrix computations, with the goal of boosting the performance of a GPU (graphics processing unit) implementation of this solver. First, we revisit the use of dual-GPU systems to execute independent stages of the BiCG concurrently on both accelerators, while leveraging the extra memory space to improve the data access patterns. In addition, we extend our ideas to compute the BiCG method efficiently in multicore platforms with a single GPU. In this line, we study the possibilities offered by hybrid CPU-GPU computations, as well as a novel synchronization-free sparse triangular linear solver. The experimental results with the new solvers show important acceleration factors with respect to the previous data-parallel CPU and GPU versions. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:79 / 87
页数:9
相关论文
共 12 条
  • [1] Accelerating hybrid TDPO and TDEEC by Multi-GPU and Multi-CPU Cooperation
    Zhao, Wei
    Xu, Le
    Li, Rui
    Shi, Xiaowei
    2016 IEEE MTT-S INTERNATIONAL CONFERENCE ON NUMERICAL ELECTROMAGNETIC AND MULTIPHYSICS MODELING AND OPTIMIZATION (NEMO), 2016,
  • [2] Accelerating hyper-spectral data processing on the multi-CPU and multi-GPU heterogeneous computing platform
    Zhang, Lei
    Gao, Jiao Bo
    Hu, Yu
    Wang, Ying Hui
    Sun, Ke Feng
    Cheng, Juan
    Sun, Dan
    Li, Yu
    SECOND INTERNATIONAL CONFERENCE ON PHOTONICS AND OPTICAL ENGINEERING, 2017, 10256
  • [3] Using MATLAB's Parallel Processing Toolbox for Multi-CPU and Multi-GPU Accelerated FDTD Simulations
    Weiss, Alec J.
    Elsherbeni, Atef Z.
    Demir, Veysel
    Hadi, Mohammed F.
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2019, 34 (05): : 724 - 730
  • [4] Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference
    Xiang, Yecheng
    Kim, Hyoseung
    2019 IEEE 40TH REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2019), 2019, : 392 - 405
  • [5] Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems
    Lee, Janghaeng
    Samadi, Mehrzad
    Park, Yongjun
    Mahlke, Scott
    2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 245 - 255
  • [6] Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments
    Trong-Tuan Vu
    Derbel, Bilel
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 56 : 95 - 109
  • [7] Time Performance Analysis of Multi-CPU and Multi-GPU in Big Data Clustering Computation
    Adiyoso, Widiarto
    Krisnadhi, Adila
    Wibisono, Ari
    Purbarani, Sumarsih Condroayu
    Saraswati, Anindhita Dwi
    Putri, Annissa Fildzah Rafi
    Saladdin, Ibad Rahadian
    Anwar, S. Reyneta Carissa
    2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2018, : 113 - 116
  • [8] Efficient CPU-GPU Work Sharing for Data-Parallel Java']JavaScript Workloads
    Piao, Xianglan
    Kim, Channoh
    Oh, Younghwan
    Kim, Hanjun
    Lee, Jae W.
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 357 - 358
  • [9] Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications
    Zhong, Ziming
    Rychkov, Vladimir
    Lastovetsky, Alexey
    2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2012, : 191 - 199
  • [10] Accelerating Yade's poromechanical coupling with matrix factorization reuse, parallel task management, and GPU computing
    Caulk, Robert A.
    Catalano, Emanuele
    Chareyre, Bruno
    COMPUTER PHYSICS COMMUNICATIONS, 2020, 248