Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

被引：4

作者：

Oh, Chanyoung ^{[1
,2
]}

So, Junhyuk ^{[2
]}

Kim, Sumin ^{[2
]}

Yi, Youngmin ^{[2
]}

机构：

[1] KT AI2XL, Taebong Ro 151, Seoul 06763, South Korea

[2] Univ Seoul, Seoulsiripdae Ro 163, Seoul, South Korea

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2021年 / 20卷 / 05期

关键词：

On-device deep learning; convolutional neural network; sparsity;

D O I：

10.1145/3477008

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88x speedup compared to TFLite on a Mali-G76 GPU.

引用

页数：25

共 50 条

[21] Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output Activation
Fan, Zhihua
Li, Wenming
Wang, Zhen
Liu, Tianyu
Wu, Haibin
Liu, Yanhuan
Wu, Meng
Wu, Xinxin
Ye, Xiaochun
Fan, Dongrui
Sun, Ninghui
An, Xuejun
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (12) : 3253 - 3265
[22] Fast Bayesian Inference of Sparse Networks with Automatic Sparsity Determination
Yu, Hang
Wu, Songwei
Xin, Luyin
Dauwels, Justin
JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[23] Fast Bayesian Inference of Sparse Networks with Automatic Sparsity Determination
Yu, Hang
Wu, Songwei
Xin, Luyin
Dauwels, Justin
1600, Microtome Publishing (21):
[24] Work-in-Progress: Flexible Group-Level Pruning of Deep Neural Networks for Fast Inference on Mobile GPUs
Lee, Kwangbae
Kim, Hoseung
Lee, Hayun
Shin, Dongkun
INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURE, AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES) 2019, 2019,
[25] Boosting Mobile CNN Inference through Semantic Memory
Li, Yun
Zhang, Chen
Han, Shihao
Zhang, Li Lyna
Yin, Baoqun
Liu, Yunxin
Xu, Mengwei
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2362 - 2371
[26] Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs
Han, Lixiang
Zhou, Zimu
Li, Zhenjiang
PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024, 2024, : 465 - 478
[27] Performance Evaluation of INT8 Quantized Inference on Mobile GPUs
Kim, Sumin
Park, Gunju
Yi, Youngmin
IEEE ACCESS, 2021, 9 : 164245 - 164255
[28] Fast CNN Inference by Adaptive Sparse Matrix Decomposition
Tian, Nannan
Liu, Yong
Wang, Weiping
Meng, Dan
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[29] PAQSIM: Fast Performance Model for Graphics Workload on Mobile GPUs
Gong, Xiang
Hu, Chunling
Lim, Chu-Cheow
21ST ACM SIGPLAN/SIGBED CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS (LCTES '20), 2020, : 3 - 13
[30] Exploiting bit sparsity in both activation and weight in neural networks accelerators
Jing, Naifeng
Zhang, Zihan
Sun, Yongshuai
Liu, Pengyu
Chen, Liyan
Wang, Qin
Jiang, Jianfei
INTEGRATION-THE VLSI JOURNAL, 2023, 88 : 400 - 409

← 1 2 3 4 5 →