In-Place Data Sliding Algorithms for Many-Core Architectures

被引:10
|
作者
Gomez-Luna, Juan [1 ]
Chang, Li-Wen [2 ]
Hwu, Wen-Mei W. [2 ]
Sung, I-Jui [3 ]
Guil, Nicolas [4 ]
机构
[1] Univ Cordoba, Comp Architecture & Elect, Cordoba, Spain
[2] Univ Illinois, Elect & Comp Engn, Urbana, IL 61801 USA
[3] MulticoreWare Inc, Champaign, IL USA
[4] Univ Malaga, Comp Architecture, Malaga, Spain
关键词
in-place; stream compaction; relational algebra;
D O I
10.1109/ICPP.2015.30
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.
引用
收藏
页码:210 / 219
页数:10
相关论文
共 50 条
  • [31] Benchmarking Molecular Dynamics with OpenCL on Many-Core Architectures
    Halver, Rene
    Homberg, Wilhelm
    Sutmann, Godehard
    PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT II, 2018, 10778 : 244 - 253
  • [32] Exploiting memory allocations in clusterised many-core architectures
    Garibotti, Rafael
    Ost, Luciano
    Butko, Anastasiia
    Reis, Ricardo
    Gamatie, Abdoulaye
    Sassatelli, Gilles
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2019, 13 (04): : 302 - 311
  • [33] TOOLS AND ENVIRONMENTS FOR MULTICORE AND MANY-CORE ARCHITECTURES INTRODUCTION
    Feng, Wu-Chun
    Balaji, Pavan
    COMPUTER, 2009, 42 (12) : 26 - 27
  • [34] Power Efficient Photonic Networks for Many-Core Architectures
    Neel, Brian
    Morris, Randy
    Ditomaso, Dominic
    Kodi, Avinash
    2012 INTERNATIONAL GREEN COMPUTING CONFERENCE (IGCC), 2012,
  • [35] Vectorizing unstructured mesh computations for many-core architectures
    Reguly, Istvan Z.
    Laszlo, Endre
    Mudalige, Gihan R.
    Giles, Mike B.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02): : 557 - 577
  • [36] Hybrid Coarrays: a PGAS Feature for Many-Core Architectures
    Cardellini, Valeria
    Fanfarillo, Alessandro
    Filippone, Salvatore
    Rouson, Damian
    PARALLEL COMPUTING: ON THE ROAD TO EXASCALE, 2016, 27 : 175 - 184
  • [37] Scalable Many-Core Algorithms for Tridiagonal Solvers
    Balogh, Gabor D.
    Flynn, Tobias S.
    Laizet, Sylvain
    Mudalige, Gihan R.
    Reguly, Istan Z.
    COMPUTING IN SCIENCE & ENGINEERING, 2022, 24 (01) : 26 - 35
  • [38] Scalable Parallel Flash Firmware for Many-core Architectures
    Zhang, Jie
    Kwon, Miryeong
    Swift, Michael
    Jung, Myoungsoo
    PROCEEDINGS OF THE 18TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2020, : 121 - 136
  • [39] Distributed Peak Power Management for Many-core Architectures
    Sartori, John
    Kumar, Rakesh
    DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 1556 - 1559
  • [40] Architectural Support for Cilk Computations on Many-core Architectures
    Long, Guoping
    Fan, Dongrui
    Zhang, Junchao
    ACM SIGPLAN NOTICES, 2009, 44 (04) : 285 - 286