Efficient Deep Learning Inference based on Model Compression

被引:10
|
作者
Zhang, Qing [1 ]
Zhang, Mengru [1 ]
Wang, Mengdi [1 ]
Sui, Wanchen [1 ]
Meng, Chen [1 ]
Yang, Jun [1 ]
Kong, Weidan [1 ]
Cui, Xiaoyuan [1 ]
Lin, Wei [1 ]
机构
[1] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
10.1109/CVPRW.2018.00221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNNs) have evolved remarkably over the last decade and achieved great success in many machine learning tasks. Along the evolution of deep learning (DL) methods, computational complexity and resource consumption of DL models continue to increase, this makes efficient deployment challenging, especially in devices with low memory resources or in applications with strict latency requirements. In this paper, we will introduce a DL inference optimization pipeline, which consists of a series of model compression methods, including Tensor Decomposition (TD), Graph Adaptive Pruning (GAP), Intrinsic Sparse Structures (ISS) in Long Short-Term Memory (LSTM), Knowledge Distillation (KD) and low-bit model quantization. We use different modeling scenarios to test our inference optimization pipeline with above mentioned methods, and it shows promising results to make inference more efficient with marginal loss of model accuracy.
引用
收藏
页码:1776 / 1783
页数:8
相关论文
共 50 条
  • [1] To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference
    Qin, Qing
    Ren, Jie
    Yu, Jialong
    Gao, Ling
    Wang, Hai
    Zheng, Jie
    Feng, Yansong
    Fang, Jianbin
    Wang, Zheng
    [J]. 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 729 - 736
  • [2] Deep Compression and EIE: Efficient Inference Engine on Compressed Deep Neural Network
    Han, Song
    Liu, Xingyu
    Mao, Huizi
    Pu, Jing
    Pedram, Ardavan
    Horowitz, Mark
    Daily, Bill
    [J]. 2016 IEEE HOT CHIPS 28 SYMPOSIUM (HCS), 2016,
  • [3] Towards Model Compression for Deep Learning Based Speech Enhancement
    Tan, Ke
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1785 - 1794
  • [4] Compression and Transmission of Big AI Model Based on Deep Learning
    Lin, Zhengping
    Zhou, Yuzhong
    Yang, Yuliang
    Shi, Jiahao
    Lin, Jie
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (02) : 1 - 8
  • [5] A Deep Learning Model Compression Algorithm Based on Optimal Clustering
    Li Wenzhen
    Lu Xu
    Wu Qirui
    Zhang Hailong
    Luo Hanwu
    Lei Cheng
    [J]. TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
  • [6] Weight Compression MAC Accelerator for Effective Inference of Deep Learning
    Maki, Asuka
    Miyashita, Daisuke
    Sasaki, Shinichi
    Nakata, Kengo
    Tachibana, Fumihiko
    Suzuki, Tomoya
    Deguchi, Jun
    Fujimoto, Ryuichi
    [J]. IEICE TRANSACTIONS ON ELECTRONICS, 2020, E103C (10) : 514 - 523
  • [7] Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT
    Bhardwaj, Kartikeya
    Lin, Ching-Yi
    Sartor, Anderson
    Marculescu, Radu
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2019, 18 (05)
  • [8] An Efficient JPEG Steganalysis Model Based on Deep Learning
    Gan, Lin
    Cheng, Yang
    Yang, Yu
    Shen, Linfeng
    Dong, Zhexuan
    [J]. SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 729 - 742
  • [9] A Deep Learning-Based Model for Gene Regulatory Network Inference
    Ma, Jialu
    Epperson, Nathan
    Talburt, John
    Yang, Mary Qu
    [J]. 2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 546 - 550
  • [10] Substation Operation Sequence Inference Model Based on Deep Reinforcement Learning
    Chen, Tie
    Li, Hongxin
    Cao, Ying
    Zhang, Zhifan
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (13):