Efficient Deep Learning Inference based on Model Compression

被引:10
|
作者
Zhang, Qing [1 ]
Zhang, Mengru [1 ]
Wang, Mengdi [1 ]
Sui, Wanchen [1 ]
Meng, Chen [1 ]
Yang, Jun [1 ]
Kong, Weidan [1 ]
Cui, Xiaoyuan [1 ]
Lin, Wei [1 ]
机构
[1] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
10.1109/CVPRW.2018.00221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNNs) have evolved remarkably over the last decade and achieved great success in many machine learning tasks. Along the evolution of deep learning (DL) methods, computational complexity and resource consumption of DL models continue to increase, this makes efficient deployment challenging, especially in devices with low memory resources or in applications with strict latency requirements. In this paper, we will introduce a DL inference optimization pipeline, which consists of a series of model compression methods, including Tensor Decomposition (TD), Graph Adaptive Pruning (GAP), Intrinsic Sparse Structures (ISS) in Long Short-Term Memory (LSTM), Knowledge Distillation (KD) and low-bit model quantization. We use different modeling scenarios to test our inference optimization pipeline with above mentioned methods, and it shows promising results to make inference more efficient with marginal loss of model accuracy.
引用
收藏
页码:1776 / 1783
页数:8
相关论文
共 50 条
  • [21] An efficient hybrid weather prediction model based on deep learning
    Utku, A.
    Can, U.
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL SCIENCE AND TECHNOLOGY, 2023, 20 (10) : 11107 - 11120
  • [22] An efficient hybrid weather prediction model based on deep learning
    A. Utku
    U. Can
    International Journal of Environmental Science and Technology, 2023, 20 : 11107 - 11120
  • [23] An Efficient Indoor Localization Based on Deep Attention Learning Model
    Abozeid A.
    Taloba A.I.
    Abd El-Aziz R.M.
    Alwaghid A.F.
    Salem M.
    Elhadad A.
    Computer Systems Science and Engineering, 2023, 46 (02): : 2637 - 2650
  • [24] Energy-efficient deep learning inference on edge devices
    Daghero, Francesco
    Pagliari, Daniele Jahier
    Poncino, Massimo
    HARDWARE ACCELERATOR SYSTEMS FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, 2021, 122 : 247 - 301
  • [25] EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models
    Li X.
    Parazeres M.
    Oberman A.
    Ghaffari A.
    Asgharian M.
    Nia V.P.
    SN Computer Science, 4 (5)
  • [26] FPGA Logic Block Architectures for Efficient Deep Learning Inference
    Eldafrawy, Mohamed
    Boutros, Andrew
    Yazdanshenas, Sadegh
    Betz, Vaughn
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2020, 13 (03)
  • [27] Survey of Deep Learning Model Compression and Acceleration
    Gao H.
    Tian Y.-L.
    Xu F.-Y.
    Zhong S.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (01): : 68 - 92
  • [28] Combining deep learning model compression techniques
    Santos Silva, Jose Vitor
    Matos Matos, Leonardo
    Santos, Flavio
    Magalhaes Cerqueira, Helisson Oliveira
    Macedo, Hendrik
    Piedade Prado, Bruno Otavio
    Ferreira da Silva, Gilton Jose
    Bispo, Kalil Araujo
    IEEE LATIN AMERICA TRANSACTIONS, 2022, 20 (03) : 458 - 464
  • [29] A Novel Deep Learning Model Compression Algorithm
    Zhao, Ming
    Li, Meng
    Peng, Sheng-Lung
    Li, Jie
    ELECTRONICS, 2022, 11 (07)
  • [30] Model Compression for Communication Efficient Federated Learning
    Shah, Suhail Mohmad
    Lau, Vincent K. N.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5937 - 5951