In-Place Data Sliding Algorithms for Many-Core Architectures

被引：10

作者：

Gomez-Luna, Juan ^{[1
]}

Chang, Li-Wen ^{[2
]}

Hwu, Wen-Mei W. ^{[2
]}

Sung, I-Jui ^{[3
]}

Guil, Nicolas ^{[4
]}

机构：

[1] Univ Cordoba, Comp Architecture & Elect, Cordoba, Spain

[2] Univ Illinois, Elect & Comp Engn, Urbana, IL 61801 USA

[3] MulticoreWare Inc, Champaign, IL USA

[4] Univ Malaga, Comp Architecture, Malaga, Spain

来源：

2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) | 2015年

关键词：

in-place; stream compaction; relational algebra;

D O I：

10.1109/ICPP.2015.30

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.

引用

页码：210 / 219

页数：10

共 50 条

[41] Adaptive Power Profiling for Many-Core HPC Architectures
Kelley, Jaimie
Stewart, Christopher
Tiwari, Devesh
Gupta, Saurabh
2016 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING (ICAC), 2016, : 179 - 188
[42] Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures
Al Farhan, Mohammed A.
Keyes, David E.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (10) : 2317 - 2332
[43] Many-core System-on-Chip: architectures and applications
Bakhouya, Mohamed
Daneshtalab, Masoud
Palesi, Maurizio
Ghasemzadeh, Hassan
MICROPROCESSORS AND MICROSYSTEMS, 2016, 43 : 1 - 3
[44] Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures
Zhang, Peng
Fang, Jianbin
Yang, Canqun
Huang, Chun
Tang, Tao
Wang, Zheng
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1878 - 1896
[45] Visualizing Complex Dynamics in Many-Core Accelerator Architectures
Ariel, Aaron
Fung, Wilson W. L.
Turner, Andrew E.
Aamodt, Tor M.
2010 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2010), 2010, : 164 - 174
[46] Silicon Photonic Memory Interconnect for Many-Core Architectures
Wen, Ke
Guan, Hang
Calhoun, David M.
Rumley, Sebastien
Bergman, Keren
Donofrio, David
Shall, John
2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
[47] Solving Matrix Equations on Multi-Core and Many-Core Architectures
Benner, Peter
Ezzatti, Pablo
Mena, Hermann
Quintana-Orti, Enrique S.
Remon, Alfredo
ALGORITHMS, 2013, 6 (04) : 857 - 870
[48] Revision of Relational Joins for Multi-Core and Many-Core Architectures
Krulis, Martin
Yaghob, Jakub
DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2011, 706 : 229 - 240
[49] A Cross-Core Performance Model for Heterogeneous Many-Core Architectures
Pinheiro, Rui
Roma, Nuno
Tomas, Pedro
HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 101 - 111
[50] RTL Test Generation on Multi-Core and Many-Core Architectures
Varadarajan, Aravind Krishnan
Hsiao, Michael S.
2019 32ND INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2019 18TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2019, : 100 - 105

← 1 2 3 4 5 →