Efficient GPU Implementation of Affine Index Permutations on Arrays

被引：0

作者：

Bouverot-Dupuis, Mathis ^{[1
]}

Sheeran, Mary ^{[2
]}

机构：

[1] ENS Paris, Paris, France

[2] Chalmers Univ, Gothenburg, Sweden

来源：

PROCEEDINGS OF THE 11TH ACM SIGPLAN INTERNATIONAL WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE AND NUMERICAL COMPUTING, FHPNC 2023 | 2023年

基金：

瑞典研究理事会;

关键词：

GPU; data-parallelism; functional languages; ALGORITHMS;

D O I：

10.1145/3609024.3609411

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.

引用

页码：15 / 28

页数：14

共 50 条

[11] PERMUTATIONS OF AN AFFINE RIGHT IN RESPECT TO SCALES
BUEKENHOUT, F
PERCSY, N
BULLETIN DE LA CLASSE DES SCIENCES ACADEMIE ROYALE DE BELGIQUE, 1982, 68 (01): : 25 - 29
[12] G-Learned Index: Enabling Efficient Learned Index on GPU
Liu, Jiesong
Zhang, Feng
Lu, Lv
Qi, Chang
Guo, Xiaoguang
Deng, Dong
Li, Guoliang
Zhang, Huanchen
Zhai, Jidong
Zhang, Hechen
Chen, Yuxing
Pan, Anqun
Du, Xiaoyong
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (06) : 795 - 812
[13] Enumerating Pattern Avoidance for Affine Permutations
Crites, Andrew
ELECTRONIC JOURNAL OF COMBINATORICS, 2010, 17 (01):
[14] Techniques for efficient DCT/IDCT implementation on generic GPU
Fang, B
Shen, GB
Li, SP
Chen, HF
2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 1126 - 1129
[15] EFFICIENT DICTIONARY LEARNING IMPLEMENTATION ON THE GPU USING OPENCL
Irofti, Paul
UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2016, 78 (03): : 39 - 50
[16] CAVLCU: an efficient GPU-based implementation of CAVLC
Fuentes-Alventosa, Antonio
Gomez-Luna, Juan
Maria Gonzalez-Linares, Jose
Guil, Nicolas
Medina-Carnicer, R.
JOURNAL OF SUPERCOMPUTING, 2022, 78 (06): : 7556 - 7590
[17] Efficient Implementation of Apriori Algorithm on HDFS using GPU
Tiwary, Mayank
Sahoo, Abhaya Kumar
Misra, Rachita
2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
[18] An Efficient Implementation of GPU Virtualization in High Performance Clusters
Duato, Jose
Igual, Francisco D.
Mayo, Rafael
Pena, Antonio J.
Quintana-Orti, Enrique S.
Silla, Federico
EURO-PAR 2009 PARALLEL PROCESSING WORKSHOPS, 2010, 6043 : 385 - +
[19] CAVLCU: an efficient GPU-based implementation of CAVLC
Antonio Fuentes-Alventosa
Juan Gómez-Luna
José Maria González-Linares
Nicolás Guil
R. Medina-Carnicer
The Journal of Supercomputing, 2022, 78 : 7556 - 7590
[20] Considerations on the FFT variants for an efficient stream implementation on GPU
Marichal-Hernandez, Jose G.
Rosa, Fernando
Rodriguez-Ramos, Jose M.
VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 1, 2006, : 80 - +

← 1 2 3 4 5 →