Efficient GPU Implementation of Affine Index Permutations on Arrays

被引:0
|
作者
Bouverot-Dupuis, Mathis [1 ]
Sheeran, Mary [2 ]
机构
[1] ENS Paris, Paris, France
[2] Chalmers Univ, Gothenburg, Sweden
来源
PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年
基金
瑞典研究理事会;
关键词
GPU; data-parallelism; functional languages; ALGORITHMS;
D O I
10.1145/3609024.3609411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [11] PERMUTATIONS OF AN AFFINE RIGHT IN RESPECT TO SCALES
    BUEKENHOUT, F
    PERCSY, N
    BULLETIN DE LA CLASSE DES SCIENCES ACADEMIE ROYALE DE BELGIQUE, 1982, 68 (01): : 25 - 29
  • [12] G-Learned Index: Enabling Efficient Learned Index on GPU
    Liu, Jiesong
    Zhang, Feng
    Lu, Lv
    Qi, Chang
    Guo, Xiaoguang
    Deng, Dong
    Li, Guoliang
    Zhang, Huanchen
    Zhai, Jidong
    Zhang, Hechen
    Chen, Yuxing
    Pan, Anqun
    Du, Xiaoyong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (06) : 795 - 812
  • [13] Enumerating Pattern Avoidance for Affine Permutations
    Crites, Andrew
    ELECTRONIC JOURNAL OF COMBINATORICS, 2010, 17 (01):
  • [14] Techniques for efficient DCT/IDCT implementation on generic GPU
    Fang, B
    Shen, GB
    Li, SP
    Chen, HF
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 1126 - 1129
  • [15] EFFICIENT DICTIONARY LEARNING IMPLEMENTATION ON THE GPU USING OPENCL
    Irofti, Paul
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2016, 78 (03): : 39 - 50
  • [16] CAVLCU: an efficient GPU-based implementation of CAVLC
    Fuentes-Alventosa, Antonio
    Gomez-Luna, Juan
    Maria Gonzalez-Linares, Jose
    Guil, Nicolas
    Medina-Carnicer, R.
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (06): : 7556 - 7590
  • [17] Efficient Implementation of Apriori Algorithm on HDFS using GPU
    Tiwary, Mayank
    Sahoo, Abhaya Kumar
    Misra, Rachita
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [18] An Efficient Implementation of GPU Virtualization in High Performance Clusters
    Duato, Jose
    Igual, Francisco D.
    Mayo, Rafael
    Pena, Antonio J.
    Quintana-Orti, Enrique S.
    Silla, Federico
    EURO-PAR 2009 PARALLEL PROCESSING WORKSHOPS, 2010, 6043 : 385 - +
  • [19] CAVLCU: an efficient GPU-based implementation of CAVLC
    Antonio Fuentes-Alventosa
    Juan Gómez-Luna
    José Maria González-Linares
    Nicolás Guil
    R. Medina-Carnicer
    The Journal of Supercomputing, 2022, 78 : 7556 - 7590
  • [20] Considerations on the FFT variants for an efficient stream implementation on GPU
    Marichal-Hernandez, Jose G.
    Rosa, Fernando
    Rodriguez-Ramos, Jose M.
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 1, 2006, : 80 - +