In-Place Data Sliding Algorithms for Many-Core Architectures

被引:10
|
作者
Gomez-Luna, Juan [1 ]
Chang, Li-Wen [2 ]
Hwu, Wen-Mei W. [2 ]
Sung, I-Jui [3 ]
Guil, Nicolas [4 ]
机构
[1] Univ Cordoba, Comp Architecture & Elect, Cordoba, Spain
[2] Univ Illinois, Elect & Comp Engn, Urbana, IL 61801 USA
[3] MulticoreWare Inc, Champaign, IL USA
[4] Univ Malaga, Comp Architecture, Malaga, Spain
关键词
in-place; stream compaction; relational algebra;
D O I
10.1109/ICPP.2015.30
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.
引用
收藏
页码:210 / 219
页数:10
相关论文
共 50 条
  • [1] Adapting Particle Filter Algorithms to Many-Core Architectures
    Chitchian, Mehdi
    van Amesfoort, Alexander S.
    Simonetto, Andrea
    Keviczky, Tamas
    Sips, Henk J.
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 427 - 438
  • [2] Optimization of Selected Remote Sensing Algorithms for Many-Core Architectures
    Riha, Lubomir
    Le Moigne, Jacqueline
    El-Ghazawi, Tarek
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2016, 9 (12) : 5576 - 5587
  • [3] Toward High-Throughput Algorithms on Many-Core Architectures
    Orozco, Daniel
    Garcia, Elkin
    Khan, Rishi
    Livingston, Kelly
    Gao, Guang R.
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 8 (04)
  • [4] AMT: asynchronous in-place matrix transpose mechanism for sunway many-core processor
    Chen, Zhengbo
    Wang, Di
    Yu, Qi
    Zheng, Fang
    Guo, Feng
    Chen, Zuoning
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (07): : 9456 - 9474
  • [5] AMT: asynchronous in-place matrix transpose mechanism for sunway many-core processor
    Zhengbo Chen
    Di Wang
    Qi Yu
    Fang Zheng
    Feng Guo
    Zuoning Chen
    The Journal of Supercomputing, 2022, 78 : 9456 - 9474
  • [6] Analysis of classic algorithms on highly-threaded many-core architectures
    Ma, Lin
    Chamberlain, Roger D.
    Agrawal, Kunal
    Tian, Chen
    Hu, Ziang
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 82 : 528 - 543
  • [7] Mapping of option pricing algorithms onto heterogeneous many-core architectures
    Shuai Zhang
    Zhao Wang
    Ying Peng
    Bertil Schmidt
    Weiguo Liu
    The Journal of Supercomputing, 2017, 73 : 3715 - 3737
  • [8] Optimizing Machine Learning Algorithms on Multi-core and Many-core Architectures using Thread and Data Mapping
    Serpa, Matheus S.
    Krause, Arthur M.
    Cruz, Eduardo H. M.
    Navaux, Philippe O. A.
    Pasin, Marcelo
    Felber, Pascal
    2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 329 - 333
  • [9] Mapping of option pricing algorithms onto heterogeneous many-core architectures
    Zhang, Shuai
    Wang, Zhao
    Peng, Ying
    Schmidt, Bertil
    Liu, Weiguo
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (09): : 3715 - 3737
  • [10] Efficient Distributed Data Structures for Future Many-core Architectures
    Fatourou, Panagiota
    Kallimanis, Nikolaos D.
    Kanellou, Eleni
    Makridakis, Odysseas
    Symeonidou, Christi
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 835 - 842