Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

被引：4

作者：

Oh, Chanyoung ^{[1
,2
]}

So, Junhyuk ^{[2
]}

Kim, Sumin ^{[2
]}

Yi, Youngmin ^{[2
]}

机构：

[1] KT AI2XL, Taebong Ro 151, Seoul 06763, South Korea

[2] Univ Seoul, Seoulsiripdae Ro 163, Seoul, South Korea

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2021年 / 20卷 / 05期

关键词：

On-device deep learning; convolutional neural network; sparsity;

D O I：

10.1145/3477008

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88x speedup compared to TFLite on a Mali-G76 GPU.

引用

页数：25

共 50 条

[1] Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference
Kurtz, Mark
Kopinsky, Justin
Gelashvili, Rati
Matveev, Alexander
Carr, John
Goin, Michael
Leiserson, William
Moore, Sage
Shavit, Nir
Alistarh, Dan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[2] CluSpa: Computation Reduction in CNN Inference by exploiting Clustering and Sparsity
Longchar, Imlijungla
Varhade, Amey A.
Ingle, Chetan P.
Baranwal, Saurabh
Kapoor, Hemangee K.
SECOND INTERNATIONAL CONFERENCE ON AIML SYSTEMS 2022, 2022,
[3] SparseFT: Sparsity-aware Fault Tolerance for Reliable CNN Inference on GPUs
Byeon, Gwangeun
Lee, Seungtae
Kim, Seongwook
Kim, Yongjun
Nair, Prashant J.
Hong, Seokin
2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 337 - 338
[4] Efficient Inference for Pruned CNN Models on Mobile Devices With Holistic Sparsity Alignment
Jin, Yuyang
Zhong, Runxin
Long, Saiqin
Zhai, Jidong
IEEE Transactions on Parallel and Distributed Systems, 2024, 35 (11) : 2208 - 2223
[5] PASS: Exploiting Post-Activation Sparsity in Streaming Architectures for CNN Acceleration
Montgomerie-Corcoran, Alexander
Yu, Zhewen
Cheng, Jianyi
Bouganis, Christos-Savvas
2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 288 - 293
[6] Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs
Xu, Weizhi
Sun, Yintai
Fan, Shengyu
Yu, Hui
Fu, Xin
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (03)
[7] AdaS: A Fast and Energy-Efficient CNN Accelerator Exploiting Bit-Sparsity
Lin, Xiaolong
Li, Gang
Liu, Zizhao
Liu, Yadong
Zhang, Fan
Song, Zhuoran
Jing, Naifeng
Liang, Xiaoyao
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[8] Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs
Xie, Xinfeng
Du, Dayou
Li, Qian
Liang, Yun
Tang, Wai Teng
Ong, Zhong Liang
Lu, Mian
Huynh Phung Huynh
Goh, Rick Siow Mong
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (02)
[9] SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference
Wang, Ziheng
PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 31 - 42
[10] Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression
Li, Yuchao
Lin, Shaohui
Zhang, Baochang
Liu, Jianzhuang
Doermann, David
Wu, Yongjian
Huang, Feiyue
Ji, Rongrong
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2795 - 2804

← 1 2 3 4 5 →