High Performance Parallelization of COMPSYN on a Cluster of Multicore Processors with GPUs

被引：1

作者：

Alessi, Ferdinando ^{[1
]}

Massini, Annalisa ^{[1
]}

Basili, Roberto ^{[2
]}

机构：

[1] Sapienza Univ Rome, Dept Comp Sci, Rome, Italy

[2] Ist Nazl Geofis & Vulcanol, Rome, Italy

来源：

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012 | 2012年 / 9卷

关键词：

GPU; CUDA; synthetic seismogram;

D O I：

10.1016/j.procs.2012.04.103

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this work we propose a high performance parallelization of the software package COMPSYN, devoted to the production of syntethic seismograms, on a cluster of multicore processors with multiple GPUs. To design and implement the proposed high performance version, we started from a naive parallel version of COMPSYN. The naive version consists in a simple parallelization on both device side, obtained by exploiting CUDA, and host side, obtained by exploiting the MPI paradigm and OpenMP API. The proposed high performance version implements several practical techniques of CUDA programming and deeply exploits the GPU architecture, thus achieving a much better performance with respect to the naive version. We compare the performance of the proposed high performance version and that of the naive one with the performance of the version running on the cluster of multicore processors without invoking the GPUs. We obtain for the high performance GPU version a speedup of 25x over the version running on the cluster of multicore processors without GPUs against the 10x of the naive version. Regarding the sequential version, we estimate about 380x the speedup of the high performance GPU version against the about 140x of the naive version.

引用

页码：966 / 975

页数：10

共 50 条

[1] Parallelization of PageRank on Multicore Processors
Kumar, Tarun
Sondhi, Parikshit
Mittal, Ankush
[J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, 2012, 7154 : 129 - +
[2] A Performance and Energy Comparison of Convolution on GPUs, FPGAs, and Multicore Processors
Fowers, Jeremy
Brown, Greg
Wernsing, John
Stitt, Greg
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
[3] Automatic Program Parallelization for Multicore Processors
Kwiatkowski, Jan
Iwaszyn, Radoslaw
[J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2010, 6067 : 236 - 245
[4] Parallelization of Data Mining Algorithms for Multicore Processors
Kholod, Ivan
Kuprianov, Mikhail
Shorov, Andrey
[J]. 2015 4TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2015, : 262 - 267
[5] Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL
Ferrer, Roger
Planas, Judit
Bellens, Pieter
Duran, Alejandro
Gonzalez, Marc
Martorell, Xavier
Badia, Rosa M.
Ayguade, Eduard
Labarta, Jesus
[J]. LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2011, 6548 : 215 - +
[6] Multicore processors and GPUs: the power of parallel computing in the Cloud
Bennett, Kelly W.
Robertson, James
[J]. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS II, 2020, 11413
[7] Power Regulation in High Performance Multicore Processors
Chen, X.
Wardi, Y.
Yalamanchili, S.
[J]. 2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
[8] Mode based Parallelization for Simulink Models on Multicore CPUs and GPUs
Zhong, Zhaoqian
Edahiro, Masato
[J]. 2019 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2019, : 103 - 104
[9] High Performance and Portable Convolution Operators for Multicore Processors
San Juan, Pablo
Castello, Adrian
Dolz, Manuel F.
Alonso-Jorda, Pedro
Quintana-Orti, Enrique S.
[J]. 2020 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2020), 2020, : 91 - 98
[10] Performance analysis and multicore processors
Carleton, G
Shands, W
[J]. DR DOBBS JOURNAL, 2006, 31 (05): : 22 - +

← 1 2 3 4 5 →