Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU

被引:0
|
作者
Liu, Xiaoyan [1 ]
Zheng, Xuegui [1 ]
Yang, Hailong [1 ]
Luan, Zhongzhi [1 ]
Qian, Depei [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
sparse convolution; GPU; performance optimization; CNN; MULTIPLICATION;
D O I
10.1145/3627535.3638471
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Convolutional neural networks (CNNs) have achieved remarkable success in various application fields. Although model compression techniques mitigate the ever-increasing resource demands of large CNN models, the compressed models usually exhibit irregular memory access and unstructured sparsity, which are difficult for dominant operators such as sparse convolution to achieve expected performance speedup on popular inference platforms such as GPU. In this paper, we propose Tetris, an efficient sparse convolution approach optimized for GPU. Tetris first fully exploits the input reuse opportunity of sparse convolution to reduce the memory accesses to global memory. It then adopts a stride packed filter (SPF) format and a bank-sensing reorganization scheme to eliminate the irregular memory accesses caused by unstructured sparsity. It also leverages a filter group reorder technique to address load imbalance among threads, and a parameter tuning method to determine the optimal parameters of the sparse convolution implementation. The experiment results show that Tetris outperforms dense/sparse convolution libraries and cutting-edge implementations with promising performance speedup.
引用
收藏
页码:229 / 242
页数:14
相关论文
共 50 条
  • [1] Exploiting GPU memory hierarchy for accelerating a specialized stencil computation
    Balaiah, Thanasekhar
    Parthasarathi, Ranjani
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (21):
  • [2] Exploiting Reuse for GPU Subgraph Enumeration
    Guo, Wentian
    Li, Yuchen
    Tan, Kian-Lee
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (09) : 4231 - 4244
  • [3] ADC-PIM: Accelerating Convolution on the GPU via In-Memory Approximate Data Comparison
    Choi, Jungwoo
    Lee, Hyuk-Jae
    Rhee, Chae Eun
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (02) : 458 - 471
  • [4] Accelerating Convolution-based Detection Model on GPU
    Liu, Qi
    Ruang, Zi
    Ru, Fuqiao
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ESTIMATION, DETECTION AND INFORMATION FUSION ICEDIF 2015, 2015, : 61 - 66
  • [5] Accelerating Hyperdimensional Computing on FPGAs by Exploiting Computational Reuse
    Salamat, Sahand
    Imani, Mohsen
    Rosing, Tajana
    IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (08) : 1159 - 1171
  • [6] On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond
    Labini, Paolo Sylos
    Cianfriglia, Marco
    Perri, Damiano
    Gervasi, Osvaldo
    Fursin, Grigori
    Lokhmotov, Anton
    Nugteren, Cedric
    Carpentieri, Bruno
    Zollo, Fabiana
    Vella, Flavio
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (01)
  • [7] Optimizing GPU Memory Transactions for Convolution Operations
    Lu, Gangzhao
    Zhang, Weizhe
    Wang, Zheng
    2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 399 - 403
  • [8] Exploiting Direct Memory Operands in GPU Instructions
    Mohammadpur-Fard, Ali
    Darabi, Sina
    Falahati, Hajar
    Mahani, Negin
    Sarbazi-Azad, Hamid
    IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (02) : 162 - 165
  • [9] Accelerating Sparse Convolution with Column Vector-Wise Sparsity
    Tan, Yijun
    Han, Kai
    Zhao, Kang
    Yu, Xianzhi
    Du, Zidong
    Chen, Yunji
    Wang, Yunhe
    Yao, Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] SqueezeFlow: A Sparse CNN Accelerator Exploiting Concise Convolution Rules
    Li, Jiajun
    Jiang, Shuhao
    Gong, Shijun
    Wu, Jingya
    Yan, Junchao
    Yan, Guihai
    Li, Xiaowei
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (11) : 1663 - 1677