Agile Queue: A Fast and Scalable Concurrent Queue on GPU

被引：0

作者：

Polak, Md Sabbir Hossain ^{[1
]}

Troendle, David ^{[1
]}

Jang, Byunghyun ^{[1
]}

机构：

[1] Univ Mississippi, University, MS 38677 USA

来源：

53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024 | 2024年

关键词：

Concurrent Queue; Data Structure; Parallel Computing;

D O I：

10.1145/3677333.3678269

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work presents Agile Queue, a queue specifically designed to support high concurrency on modern GPUs. At its core is to replace conflicting accesses to shared objects with independent accesses to private data. The proposed Agile queue operates on two different granularity - thread block and warp. While the thread block granularity exploits better parallelism among threads, it requires a synchronization primitive to designate a master thread. The warp granularity, on the other hand, leverages work sharing strategy among threads in a warp without any synchronization, which reduces the inherent branch divergence. Both variants support the wrap-around of the head and tail across the ring buffer. Each request to the ring buffer generates a ticket for strict ordering without fully blocking at queue boundary conditions. While the thread block variant utilizes shared memory to reduce global memory accesses, the warp variant broadcasts the offset to all other lanes in the warp by the leader (first active) thread within the warp. Our experiments demonstrate the superior performance and scalability of the Agile queue over existing solutions. Specifically it outperforms the BWD (Broker Queue Work Distributor), the fastest GPU queue to our knowledge, by more than 2x without compromising FIFO semantics.

引用

页码：108 / 109

页数：2

共 50 条

[1] The Broker Queue: A Fast, Linearizable FIFO Queue for Fine-Granular Work Distribution on the GPU
Kerbl, Bernhard
Kenzel, Michael
Mueller, Joerg H.
Schmalstieg, Dieter
Steinberger, Markus
[J]. INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, : 76 - 85
[2] Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures
Min, Changwoo
Jun, Hyung Kook
Kim, Won Tae
Eom, Young Ik
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (12) : 2956 - 2957
[3] A Solution of Concurrent Queue on PSTM
Popovic, Marko
Kordic, Branislav
Popovic, Miroslav
Basicevic, Ilija
[J]. 2018 26TH TELECOMMUNICATIONS FORUM (TELFOR), 2018, : 735 - 738
[4] Writing a Generalized Concurrent Queue
Sutter, Herb
[J]. DR DOBBS JOURNAL, 2008, 33 (11): : 68 - 70
[5] A HIGHLY CONCURRENT PRIORITY QUEUE
JOHNSON, T
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 22 (02) : 367 - 373
[6] A Fast Scalable Hardware Priority Queue and Optimizations for Multi-Pushes
Collinson, Samuel
Bai, Allan
Sinnen, Oliver
[J]. 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 134 - 140
[7] A Fast, Single-Instruction-Multiple-Data, Scalable Priority Queue
Benacer, Imad
Boyer, Francois-Raymond
Savaria, Yvon
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (10) : 1939 - 1952
[8] A scalable queue for work distribution on GPUs
Kerbl B.
Müller J.
Kenzel M.
Schmalstieg D.
Steinberger M.
[J]. 2018, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (53): : 401 - 402
[9] A Practical, Scalable, Relaxed Priority Queue
Zhou, Tingzhe
Michael, Maged
Spear, Michael
[J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
[10] Scalable configuration of RED queue parameters
Chandrayana, K
Sikdar, B
Kalyanaraman, S
[J]. 2001 IEEE WORKSHOP ON HIGH PERFORMANCE SWITCHING AND ROUTING, 2001, : 185 - 189

← 1 2 3 4 5 →