Efficient Deep Learning Inference based on Model Compression

被引:10
|
作者
Zhang, Qing [1 ]
Zhang, Mengru [1 ]
Wang, Mengdi [1 ]
Sui, Wanchen [1 ]
Meng, Chen [1 ]
Yang, Jun [1 ]
Kong, Weidan [1 ]
Cui, Xiaoyuan [1 ]
Lin, Wei [1 ]
机构
[1] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
10.1109/CVPRW.2018.00221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNNs) have evolved remarkably over the last decade and achieved great success in many machine learning tasks. Along the evolution of deep learning (DL) methods, computational complexity and resource consumption of DL models continue to increase, this makes efficient deployment challenging, especially in devices with low memory resources or in applications with strict latency requirements. In this paper, we will introduce a DL inference optimization pipeline, which consists of a series of model compression methods, including Tensor Decomposition (TD), Graph Adaptive Pruning (GAP), Intrinsic Sparse Structures (ISS) in Long Short-Term Memory (LSTM), Knowledge Distillation (KD) and low-bit model quantization. We use different modeling scenarios to test our inference optimization pipeline with above mentioned methods, and it shows promising results to make inference more efficient with marginal loss of model accuracy.
引用
下载
收藏
页码:1776 / 1783
页数:8
相关论文
共 50 条
  • [41] Deep Learning-based Hybrid Model for Efficient Anomaly Detection
    Osamor, Frances
    Wellman, Briana
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 975 - 979
  • [42] An efficient plant disease prediction model based on machine learning and deep learning classifiers
    Nirmala Shinde
    Asha Ambhaikar
    Evolutionary Intelligence, 2025, 18 (1)
  • [43] A Hybrid parallel deep learning model for efficient intrusion detection based on metric learning
    Cai, Shaokang
    Han, Dezhi
    Yin, Xinming
    Li, Dun
    Chang, Chin-Chen
    CONNECTION SCIENCE, 2022, 34 (01) : 551 - 577
  • [44] On-Device Deep Learning Inference for Efficient Activity Data Collection
    Mairittha, Nattaya
    Mairittha, Tittaya
    Inoue, Sozo
    SENSORS, 2019, 19 (15)
  • [45] Training Deep Face Recognition for Efficient Inference by Distillation and Mutual Learning
    Shen, Guodong
    Shen, Yao
    RiaZ, M. Naveed
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2018, : 38 - 43
  • [46] Progressive Transmission and Inference of Deep Learning Models deep learning model transmission, deep learning model deployment, deep learning application, progressive transmission, user experience
    Lee, Youngsoo
    Yun, Sangdoo
    Kim, Yeonghun
    Choi, Sunghee
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 271 - 277
  • [47] Toward Secure and Efficient Deep Learning Inference in Dependable IoT Systems
    Qiu, Han
    Zheng, Qinkai
    Zhang, Tianwei
    Qiu, Meikang
    Memmi, Gerard
    Lu, Jialiang
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (05) : 3180 - 3188
  • [48] EPIDL: Towards efficient and privacy-preserving inference in deep learning
    Nie, Chenfei
    Zhou, Zhipeng
    Dong, Mianxiong
    Ota, Kaoru
    Li, Qiang
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (14):
  • [49] Memory-Efficient Deep Learning Inference in Trusted Execution Environments
    Truong, Jean-Baptiste
    Gallagher, William
    Guo, Tian
    Walls, Robert J.
    2021 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E 2021, 2021, : 161 - 167
  • [50] Cost Efficient GPU Cluster Management for Training and Inference of Deep Learning
    Kang, Dong-Ki
    Lee, Ki-Beom
    Kim, Young-Chon
    ENERGIES, 2022, 15 (02)