Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

被引:4
|
作者
Oh, Chanyoung [1 ,2 ]
So, Junhyuk [2 ]
Kim, Sumin [2 ]
Yi, Youngmin [2 ]
机构
[1] KT AI2XL, Taebong Ro 151, Seoul 06763, South Korea
[2] Univ Seoul, Seoulsiripdae Ro 163, Seoul, South Korea
关键词
On-device deep learning; convolutional neural network; sparsity;
D O I
10.1145/3477008
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88x speedup compared to TFLite on a Mali-G76 GPU.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Leveraging Fine-grained Structured Sparsity for CNN Inference on Systolic Array Architectures
    Liu, Linqiao
    Brown, Stephen
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 301 - 305
  • [32] An Operation-Minimized FPGA Accelerator Design by Dynamically Exploiting Sparsity in CNN Winograd Transform
    Di, Xinkai
    Yang, Haigang
    Huang, Zhihong
    Mao, Ning
    32ND IEEE INTERNATIONAL SYSTEM ON CHIP CONFERENCE (IEEE SOCC 2019), 2019, : 50 - 55
  • [33] EAIS: Energy-aware adaptive scheduling for CNN inference on high-performance GPUs
    Yao, Chunrong
    Liu, Wantao
    Tang, Weiqing
    Hu, Songlin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 130 : 253 - 268
  • [34] Faster R-CNN with Structured Sparsity Learning and Ristretto for Mobile Environment
    Nasution, Muhammad Arif
    Chahyati, Dina
    Fanany, Mohamad Ivan
    2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 309 - 314
  • [35] Pipelining of a Mobile SoC and an External NPU for Accelerating CNN Inference
    Kwon, Jinse
    Lee, Jemin
    Kim, Hyungshin
    IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (02) : 150 - 153
  • [36] Exploiting GPUs for fast force-directed visualization of large-scale networks
    Brinkmann, Govert G.
    Rietveld, Kristian F. D.
    Takes, Frank W.
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 382 - 391
  • [37] IF-CNN: Image-Aware Inference Framework for CNN With the Collaboration of Mobile Devices and Cloud
    Shu, Guansheng
    Liu, Weiqing
    Zheng, Xiaojie
    Li, Jing
    IEEE ACCESS, 2018, 6 : 68621 - 68633
  • [38] A multiplier-Free RNS-Based CNN accelerator exploiting bit-Level sparsity
    Sakellariou, Vasilis
    Paliouras, Vassilis
    Kouretas, Ioannis
    Saleh, Hani
    Stouraitis, Thanos
    2023 IEEE 30TH SYMPOSIUM ON COMPUTER ARITHMETIC, ARITH 2023, 2023, : 101 - 101
  • [39] A Multiplier-Free RNS-Based CNN Accelerator Exploiting Bit-Level Sparsity
    Sakellariou, Vasilis
    Paliouras, Vassilis
    Kouretas, Ioannis
    Saleh, Hani
    Stouraitis, Thanos
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (02) : 667 - 683
  • [40] Energy-Efficient and High-Throughput CNN Inference on Embedded CPUs-GPUs MPSoCs
    Tang, Erqian
    Minakova, Svetlana
    Stefanov, Todor
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2021, 2022, 13227 : 127 - 143