In-Place Data Sliding Algorithms for Many-Core Architectures

被引:10
|
作者
Gomez-Luna, Juan [1 ]
Chang, Li-Wen [2 ]
Hwu, Wen-Mei W. [2 ]
Sung, I-Jui [3 ]
Guil, Nicolas [4 ]
机构
[1] Univ Cordoba, Comp Architecture & Elect, Cordoba, Spain
[2] Univ Illinois, Elect & Comp Engn, Urbana, IL 61801 USA
[3] MulticoreWare Inc, Champaign, IL USA
[4] Univ Malaga, Comp Architecture, Malaga, Spain
关键词
in-place; stream compaction; relational algebra;
D O I
10.1109/ICPP.2015.30
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.
引用
收藏
页码:210 / 219
页数:10
相关论文
共 50 条
  • [41] Adaptive Power Profiling for Many-Core HPC Architectures
    Kelley, Jaimie
    Stewart, Christopher
    Tiwari, Devesh
    Gupta, Saurabh
    2016 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING (ICAC), 2016, : 179 - 188
  • [42] Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures
    Al Farhan, Mohammed A.
    Keyes, David E.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (10) : 2317 - 2332
  • [43] Many-core System-on-Chip: architectures and applications
    Bakhouya, Mohamed
    Daneshtalab, Masoud
    Palesi, Maurizio
    Ghasemzadeh, Hassan
    MICROPROCESSORS AND MICROSYSTEMS, 2016, 43 : 1 - 3
  • [44] Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures
    Zhang, Peng
    Fang, Jianbin
    Yang, Canqun
    Huang, Chun
    Tang, Tao
    Wang, Zheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1878 - 1896
  • [45] Visualizing Complex Dynamics in Many-Core Accelerator Architectures
    Ariel, Aaron
    Fung, Wilson W. L.
    Turner, Andrew E.
    Aamodt, Tor M.
    2010 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2010), 2010, : 164 - 174
  • [46] Silicon Photonic Memory Interconnect for Many-Core Architectures
    Wen, Ke
    Guan, Hang
    Calhoun, David M.
    Rumley, Sebastien
    Bergman, Keren
    Donofrio, David
    Shall, John
    2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
  • [47] Solving Matrix Equations on Multi-Core and Many-Core Architectures
    Benner, Peter
    Ezzatti, Pablo
    Mena, Hermann
    Quintana-Orti, Enrique S.
    Remon, Alfredo
    ALGORITHMS, 2013, 6 (04) : 857 - 870
  • [48] Revision of Relational Joins for Multi-Core and Many-Core Architectures
    Krulis, Martin
    Yaghob, Jakub
    DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2011, 706 : 229 - 240
  • [49] A Cross-Core Performance Model for Heterogeneous Many-Core Architectures
    Pinheiro, Rui
    Roma, Nuno
    Tomas, Pedro
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 101 - 111
  • [50] RTL Test Generation on Multi-Core and Many-Core Architectures
    Varadarajan, Aravind Krishnan
    Hsiao, Michael S.
    2019 32ND INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2019 18TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2019, : 100 - 105