Agile Queue: A Fast and Scalable Concurrent Queue on GPU

被引:0
|
作者
Polak, Md Sabbir Hossain [1 ]
Troendle, David [1 ]
Jang, Byunghyun [1 ]
机构
[1] Univ Mississippi, University, MS 38677 USA
关键词
Concurrent Queue; Data Structure; Parallel Computing;
D O I
10.1145/3677333.3678269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work presents Agile Queue, a queue specifically designed to support high concurrency on modern GPUs. At its core is to replace conflicting accesses to shared objects with independent accesses to private data. The proposed Agile queue operates on two different granularity - thread block and warp. While the thread block granularity exploits better parallelism among threads, it requires a synchronization primitive to designate a master thread. The warp granularity, on the other hand, leverages work sharing strategy among threads in a warp without any synchronization, which reduces the inherent branch divergence. Both variants support the wrap-around of the head and tail across the ring buffer. Each request to the ring buffer generates a ticket for strict ordering without fully blocking at queue boundary conditions. While the thread block variant utilizes shared memory to reduce global memory accesses, the warp variant broadcasts the offset to all other lanes in the warp by the leader (first active) thread within the warp. Our experiments demonstrate the superior performance and scalability of the Agile queue over existing solutions. Specifically it outperforms the BWD (Broker Queue Work Distributor), the fastest GPU queue to our knowledge, by more than 2x without compromising FIFO semantics.
引用
收藏
页码:108 / 109
页数:2
相关论文
共 50 条
  • [1] The Broker Queue: A Fast, Linearizable FIFO Queue for Fine-Granular Work Distribution on the GPU
    Kerbl, Bernhard
    Kenzel, Michael
    Mueller, Joerg H.
    Schmalstieg, Dieter
    Steinberger, Markus
    [J]. INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, : 76 - 85
  • [2] Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures
    Min, Changwoo
    Jun, Hyung Kook
    Kim, Won Tae
    Eom, Young Ik
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (12) : 2956 - 2957
  • [3] A Solution of Concurrent Queue on PSTM
    Popovic, Marko
    Kordic, Branislav
    Popovic, Miroslav
    Basicevic, Ilija
    [J]. 2018 26TH TELECOMMUNICATIONS FORUM (TELFOR), 2018, : 735 - 738
  • [4] Writing a Generalized Concurrent Queue
    Sutter, Herb
    [J]. DR DOBBS JOURNAL, 2008, 33 (11): : 68 - 70
  • [5] A HIGHLY CONCURRENT PRIORITY QUEUE
    JOHNSON, T
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 22 (02) : 367 - 373
  • [6] A Fast Scalable Hardware Priority Queue and Optimizations for Multi-Pushes
    Collinson, Samuel
    Bai, Allan
    Sinnen, Oliver
    [J]. 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 134 - 140
  • [7] A Fast, Single-Instruction-Multiple-Data, Scalable Priority Queue
    Benacer, Imad
    Boyer, Francois-Raymond
    Savaria, Yvon
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (10) : 1939 - 1952
  • [8] A scalable queue for work distribution on GPUs
    Kerbl B.
    Müller J.
    Kenzel M.
    Schmalstieg D.
    Steinberger M.
    [J]. 2018, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (53): : 401 - 402
  • [9] A Practical, Scalable, Relaxed Priority Queue
    Zhou, Tingzhe
    Michael, Maged
    Spear, Michael
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [10] Scalable configuration of RED queue parameters
    Chandrayana, K
    Sikdar, B
    Kalyanaraman, S
    [J]. 2001 IEEE WORKSHOP ON HIGH PERFORMANCE SWITCHING AND ROUTING, 2001, : 185 - 189