Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

被引:4
|
作者
Oh, Chanyoung [1 ,2 ]
So, Junhyuk [2 ]
Kim, Sumin [2 ]
Yi, Youngmin [2 ]
机构
[1] KT AI2XL, Taebong Ro 151, Seoul 06763, South Korea
[2] Univ Seoul, Seoulsiripdae Ro 163, Seoul, South Korea
关键词
On-device deep learning; convolutional neural network; sparsity;
D O I
10.1145/3477008
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88x speedup compared to TFLite on a Mali-G76 GPU.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output Activation
    Fan, Zhihua
    Li, Wenming
    Wang, Zhen
    Liu, Tianyu
    Wu, Haibin
    Liu, Yanhuan
    Wu, Meng
    Wu, Xinxin
    Ye, Xiaochun
    Fan, Dongrui
    Sun, Ninghui
    An, Xuejun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (12) : 3253 - 3265
  • [22] Fast Bayesian Inference of Sparse Networks with Automatic Sparsity Determination
    Yu, Hang
    Wu, Songwei
    Xin, Luyin
    Dauwels, Justin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [23] Fast Bayesian Inference of Sparse Networks with Automatic Sparsity Determination
    Yu, Hang
    Wu, Songwei
    Xin, Luyin
    Dauwels, Justin
    1600, Microtome Publishing (21):
  • [24] Work-in-Progress: Flexible Group-Level Pruning of Deep Neural Networks for Fast Inference on Mobile GPUs
    Lee, Kwangbae
    Kim, Hoseung
    Lee, Hayun
    Shin, Dongkun
    INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURE, AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES) 2019, 2019,
  • [25] Boosting Mobile CNN Inference through Semantic Memory
    Li, Yun
    Zhang, Chen
    Han, Shihao
    Zhang, Li Lyna
    Yin, Baoqun
    Liu, Yunxin
    Xu, Mengwei
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2362 - 2371
  • [26] Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs
    Han, Lixiang
    Zhou, Zimu
    Li, Zhenjiang
    PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024, 2024, : 465 - 478
  • [27] Performance Evaluation of INT8 Quantized Inference on Mobile GPUs
    Kim, Sumin
    Park, Gunju
    Yi, Youngmin
    IEEE ACCESS, 2021, 9 : 164245 - 164255
  • [28] Fast CNN Inference by Adaptive Sparse Matrix Decomposition
    Tian, Nannan
    Liu, Yong
    Wang, Weiping
    Meng, Dan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [29] PAQSIM: Fast Performance Model for Graphics Workload on Mobile GPUs
    Gong, Xiang
    Hu, Chunling
    Lim, Chu-Cheow
    21ST ACM SIGPLAN/SIGBED CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS (LCTES '20), 2020, : 3 - 13
  • [30] Exploiting bit sparsity in both activation and weight in neural networks accelerators
    Jing, Naifeng
    Zhang, Zihan
    Sun, Yongshuai
    Liu, Pengyu
    Chen, Liyan
    Wang, Qin
    Jiang, Jianfei
    INTEGRATION-THE VLSI JOURNAL, 2023, 88 : 400 - 409