In-Place Data Sliding Algorithms for Many-Core Architectures

被引：10

作者：

Gomez-Luna, Juan ^{[1
]}

Chang, Li-Wen ^{[2
]}

Hwu, Wen-Mei W. ^{[2
]}

Sung, I-Jui ^{[3
]}

Guil, Nicolas ^{[4
]}

机构：

[1] Univ Cordoba, Comp Architecture & Elect, Cordoba, Spain

[2] Univ Illinois, Elect & Comp Engn, Urbana, IL 61801 USA

[3] MulticoreWare Inc, Champaign, IL USA

[4] Univ Malaga, Comp Architecture, Malaga, Spain

来源：

2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) | 2015年

关键词：

in-place; stream compaction; relational algebra;

D O I：

10.1109/ICPP.2015.30

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.

引用

页码：210 / 219

页数：10

共 50 条

[31] Benchmarking Molecular Dynamics with OpenCL on Many-Core Architectures
Halver, Rene
Homberg, Wilhelm
Sutmann, Godehard
PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT II, 2018, 10778 : 244 - 253
[32] Exploiting memory allocations in clusterised many-core architectures
Garibotti, Rafael
Ost, Luciano
Butko, Anastasiia
Reis, Ricardo
Gamatie, Abdoulaye
Sassatelli, Gilles
IET COMPUTERS AND DIGITAL TECHNIQUES, 2019, 13 (04): : 302 - 311
[33] TOOLS AND ENVIRONMENTS FOR MULTICORE AND MANY-CORE ARCHITECTURES INTRODUCTION
Feng, Wu-Chun
Balaji, Pavan
COMPUTER, 2009, 42 (12) : 26 - 27
[34] Power Efficient Photonic Networks for Many-Core Architectures
Neel, Brian
Morris, Randy
Ditomaso, Dominic
Kodi, Avinash
2012 INTERNATIONAL GREEN COMPUTING CONFERENCE (IGCC), 2012,
[35] Vectorizing unstructured mesh computations for many-core architectures
Reguly, Istvan Z.
Laszlo, Endre
Mudalige, Gihan R.
Giles, Mike B.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02): : 557 - 577
[36] Hybrid Coarrays: a PGAS Feature for Many-Core Architectures
Cardellini, Valeria
Fanfarillo, Alessandro
Filippone, Salvatore
Rouson, Damian
PARALLEL COMPUTING: ON THE ROAD TO EXASCALE, 2016, 27 : 175 - 184
[37] Scalable Many-Core Algorithms for Tridiagonal Solvers
Balogh, Gabor D.
Flynn, Tobias S.
Laizet, Sylvain
Mudalige, Gihan R.
Reguly, Istan Z.
COMPUTING IN SCIENCE & ENGINEERING, 2022, 24 (01) : 26 - 35
[38] Scalable Parallel Flash Firmware for Many-core Architectures
Zhang, Jie
Kwon, Miryeong
Swift, Michael
Jung, Myoungsoo
PROCEEDINGS OF THE 18TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2020, : 121 - 136
[39] Distributed Peak Power Management for Many-core Architectures
Sartori, John
Kumar, Rakesh
DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 1556 - 1559
[40] Architectural Support for Cilk Computations on Many-core Architectures
Long, Guoping
Fan, Dongrui
Zhang, Junchao
ACM SIGPLAN NOTICES, 2009, 44 (04) : 285 - 286

← 1 2 3 4 5 →