Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU

被引：0

作者：

Liu, Xiaoyan ^{[1
]}

Zheng, Xuegui ^{[1
]}

Yang, Hailong ^{[1
]}

Luan, Zhongzhi ^{[1
]}

Qian, Depei ^{[1
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

sparse convolution; GPU; performance optimization; CNN; MULTIPLICATION;

D O I：

10.1145/3627535.3638471

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Convolutional neural networks (CNNs) have achieved remarkable success in various application fields. Although model compression techniques mitigate the ever-increasing resource demands of large CNN models, the compressed models usually exhibit irregular memory access and unstructured sparsity, which are difficult for dominant operators such as sparse convolution to achieve expected performance speedup on popular inference platforms such as GPU. In this paper, we propose Tetris, an efficient sparse convolution approach optimized for GPU. Tetris first fully exploits the input reuse opportunity of sparse convolution to reduce the memory accesses to global memory. It then adopts a stride packed filter (SPF) format and a bank-sensing reorganization scheme to eliminate the irregular memory accesses caused by unstructured sparsity. It also leverages a filter group reorder technique to address load imbalance among threads, and a parameter tuning method to determine the optimal parameters of the sparse convolution implementation. The experiment results show that Tetris outperforms dense/sparse convolution libraries and cutting-edge implementations with promising performance speedup.

引用

页码：229 / 242

页数：14

共 50 条

[1] Exploiting GPU memory hierarchy for accelerating a specialized stencil computation
Balaiah, Thanasekhar
Parthasarathi, Ranjani
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (21):
[2] Exploiting Reuse for GPU Subgraph Enumeration
Guo, Wentian
Li, Yuchen
Tan, Kian-Lee
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (09) : 4231 - 4244
[3] ADC-PIM: Accelerating Convolution on the GPU via In-Memory Approximate Data Comparison
Choi, Jungwoo
Lee, Hyuk-Jae
Rhee, Chae Eun
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (02) : 458 - 471
[4] Accelerating Convolution-based Detection Model on GPU
Liu, Qi
Ruang, Zi
Ru, Fuqiao
PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ESTIMATION, DETECTION AND INFORMATION FUSION ICEDIF 2015, 2015, : 61 - 66
[5] Accelerating Hyperdimensional Computing on FPGAs by Exploiting Computational Reuse
Salamat, Sahand
Imani, Mohsen
Rosing, Tajana
IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (08) : 1159 - 1171
[6] On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond
Labini, Paolo Sylos
Cianfriglia, Marco
Perri, Damiano
Gervasi, Osvaldo
Fursin, Grigori
Lokhmotov, Anton
Nugteren, Cedric
Carpentieri, Bruno
Zollo, Fabiana
Vella, Flavio
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (01)
[7] Optimizing GPU Memory Transactions for Convolution Operations
Lu, Gangzhao
Zhang, Weizhe
Wang, Zheng
2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 399 - 403
[8] Exploiting Direct Memory Operands in GPU Instructions
Mohammadpur-Fard, Ali
Darabi, Sina
Falahati, Hajar
Mahani, Negin
Sarbazi-Azad, Hamid
IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (02) : 162 - 165
[9] Accelerating Sparse Convolution with Column Vector-Wise Sparsity
Tan, Yijun
Han, Kai
Zhao, Kang
Yu, Xianzhi
Du, Zidong
Chen, Yunji
Wang, Yunhe
Yao, Jun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[10] SqueezeFlow: A Sparse CNN Accelerator Exploiting Concise Convolution Rules
Li, Jiajun
Jiang, Shuhao
Gong, Shijun
Wu, Jingya
Yan, Junchao
Yan, Guihai
Li, Xiaowei
IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (11) : 1663 - 1677

← 1 2 3 4 5 →