Efficient Deep Learning Inference based on Model Compression

被引:10
|
作者
Zhang, Qing [1 ]
Zhang, Mengru [1 ]
Wang, Mengdi [1 ]
Sui, Wanchen [1 ]
Meng, Chen [1 ]
Yang, Jun [1 ]
Kong, Weidan [1 ]
Cui, Xiaoyuan [1 ]
Lin, Wei [1 ]
机构
[1] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
10.1109/CVPRW.2018.00221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNNs) have evolved remarkably over the last decade and achieved great success in many machine learning tasks. Along the evolution of deep learning (DL) methods, computational complexity and resource consumption of DL models continue to increase, this makes efficient deployment challenging, especially in devices with low memory resources or in applications with strict latency requirements. In this paper, we will introduce a DL inference optimization pipeline, which consists of a series of model compression methods, including Tensor Decomposition (TD), Graph Adaptive Pruning (GAP), Intrinsic Sparse Structures (ISS) in Long Short-Term Memory (LSTM), Knowledge Distillation (KD) and low-bit model quantization. We use different modeling scenarios to test our inference optimization pipeline with above mentioned methods, and it shows promising results to make inference more efficient with marginal loss of model accuracy.
引用
收藏
页码:1776 / 1783
页数:8
相关论文
共 50 条
  • [31] Mitigating carbon footprint for knowledge distillation based deep learning model compression
    Rafat, Kazi
    Islam, Sadia
    Mahfug, Abdullah Al
    Hossain, Md. Ismail
    Rahman, Fuad
    Momen, Sifat
    Rahman, Shafin
    Mohammed, Nabeel
    PLOS ONE, 2023, 18 (05):
  • [32] Deep Learning Autoencoder-based Compression for Current Source Model Waveforms
    Raslan, Waseem
    Ismail, Yehea
    2021 28TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS (IEEE ICECS 2021), 2021,
  • [33] A Deep-Learning Model for Service QoS Prediction Based on Feature Mapping and Inference
    Zhang, Peiyun
    Ren, Jigang
    Huang, Wenjun
    Chen, Yutong
    Zhao, Qinglin
    Zhu, Haibin
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (04) : 1311 - 1325
  • [34] Model-Based Deep Reinforcement Learning with Traffic Inference for Traffic Signal Control
    Wang, Hao
    Zhu, Jinan
    Gu, Bao
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [35] A deep learning scheme for efficient multimedia IoT data compression
    Noura, Hassan N.
    Azar, Joseph
    Salman, Ola
    Couturier, Raphael
    Mazouzi, Kamel
    AD HOC NETWORKS, 2023, 138
  • [36] Lightweight Image Compression Based on Deep Learning
    Li, Mengyao
    Wang, Zhengyong
    Shen, Liquan
    Ding, Qing
    Yu, Liangwei
    Jiang, Xuhao
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 106 - 116
  • [37] Deep Learning Based Test Compression Analyzer
    Wu, Cheng-Hung
    Huang, Yu
    Lee, Kuen-Jong
    Cheng, Wu-Tung
    Veda, Gaurav
    Reddy, Sudhakar M.
    Hu, Chun-Cheng
    Ye, Chong-Siao
    2019 IEEE 28TH ASIAN TEST SYMPOSIUM (ATS), 2019, : 1 - 6
  • [38] An Efficient Deep Learning based Hybrid Model Image Caption Generation for
    Kaur, Mehzabeen
    Kaur, Harpreet
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (03) : 231 - 237
  • [39] An Efficient DenseNet-Based Deep Learning Model for Malware Detection
    Hemalatha, Jeyaprakash
    Roseline, S. Abijah
    Geetha, Subbiah
    Kadry, Seifedine
    Damasevicius, Robertas
    ENTROPY, 2021, 23 (03)
  • [40] An efficient deep learning based fog removal model for multimedia applications
    Saxena, Gaurav
    Bhadauria, Sarita Singh
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2021, 29 (03) : 1445 - 1463