Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

被引:4
|
作者
Oh, Chanyoung [1 ,2 ]
So, Junhyuk [2 ]
Kim, Sumin [2 ]
Yi, Youngmin [2 ]
机构
[1] KT AI2XL, Taebong Ro 151, Seoul 06763, South Korea
[2] Univ Seoul, Seoulsiripdae Ro 163, Seoul, South Korea
关键词
On-device deep learning; convolutional neural network; sparsity;
D O I
10.1145/3477008
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88x speedup compared to TFLite on a Mali-G76 GPU.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference
    Kurtz, Mark
    Kopinsky, Justin
    Gelashvili, Rati
    Matveev, Alexander
    Carr, John
    Goin, Michael
    Leiserson, William
    Moore, Sage
    Shavit, Nir
    Alistarh, Dan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [2] CluSpa: Computation Reduction in CNN Inference by exploiting Clustering and Sparsity
    Longchar, Imlijungla
    Varhade, Amey A.
    Ingle, Chetan P.
    Baranwal, Saurabh
    Kapoor, Hemangee K.
    SECOND INTERNATIONAL CONFERENCE ON AIML SYSTEMS 2022, 2022,
  • [3] SparseFT: Sparsity-aware Fault Tolerance for Reliable CNN Inference on GPUs
    Byeon, Gwangeun
    Lee, Seungtae
    Kim, Seongwook
    Kim, Yongjun
    Nair, Prashant J.
    Hong, Seokin
    2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 337 - 338
  • [4] Efficient Inference for Pruned CNN Models on Mobile Devices With Holistic Sparsity Alignment
    Jin, Yuyang
    Zhong, Runxin
    Long, Saiqin
    Zhai, Jidong
    IEEE Transactions on Parallel and Distributed Systems, 2024, 35 (11) : 2208 - 2223
  • [5] PASS: Exploiting Post-Activation Sparsity in Streaming Architectures for CNN Acceleration
    Montgomerie-Corcoran, Alexander
    Yu, Zhewen
    Cheng, Jianyi
    Bouganis, Christos-Savvas
    2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 288 - 293
  • [6] Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs
    Xu, Weizhi
    Sun, Yintai
    Fan, Shengyu
    Yu, Hui
    Fu, Xin
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (03)
  • [7] AdaS: A Fast and Energy-Efficient CNN Accelerator Exploiting Bit-Sparsity
    Lin, Xiaolong
    Li, Gang
    Liu, Zizhao
    Liu, Yadong
    Zhang, Fan
    Song, Zhuoran
    Jing, Naifeng
    Liang, Xiaoyao
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [8] Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs
    Xie, Xinfeng
    Du, Dayou
    Li, Qian
    Liang, Yun
    Tang, Wai Teng
    Ong, Zhong Liang
    Lu, Mian
    Huynh Phung Huynh
    Goh, Rick Siow Mong
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (02)
  • [9] SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference
    Wang, Ziheng
    PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 31 - 42
  • [10] Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression
    Li, Yuchao
    Lin, Shaohui
    Zhang, Baochang
    Liu, Jianzhuang
    Doermann, David
    Wu, Yongjian
    Huang, Feiyue
    Ji, Rongrong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2795 - 2804