Efficient Deep Learning Inference based on Model Compression

被引：10

作者：

Zhang, Qing ^{[1
]}

Zhang, Mengru ^{[1
]}

Wang, Mengdi ^{[1
]}

Sui, Wanchen ^{[1
]}

Meng, Chen ^{[1
]}

Yang, Jun ^{[1
]}

Kong, Weidan ^{[1
]}

Cui, Xiaoyuan ^{[1
]}

Lin, Wei ^{[1
]}

机构：

[1] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China

来源：

PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW) | 2018年

关键词：

D O I：

10.1109/CVPRW.2018.00221

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks (DNNs) have evolved remarkably over the last decade and achieved great success in many machine learning tasks. Along the evolution of deep learning (DL) methods, computational complexity and resource consumption of DL models continue to increase, this makes efficient deployment challenging, especially in devices with low memory resources or in applications with strict latency requirements. In this paper, we will introduce a DL inference optimization pipeline, which consists of a series of model compression methods, including Tensor Decomposition (TD), Graph Adaptive Pruning (GAP), Intrinsic Sparse Structures (ISS) in Long Short-Term Memory (LSTM), Knowledge Distillation (KD) and low-bit model quantization. We use different modeling scenarios to test our inference optimization pipeline with above mentioned methods, and it shows promising results to make inference more efficient with marginal loss of model accuracy.

引用

下载

页码：1776 / 1783

页数：8

共 50 条

[41] Deep Learning-based Hybrid Model for Efficient Anomaly Detection
Osamor, Frances
Wellman, Briana
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 975 - 979
[42] An efficient plant disease prediction model based on machine learning and deep learning classifiers
Nirmala Shinde
Asha Ambhaikar
Evolutionary Intelligence, 2025, 18 (1)
[43] A Hybrid parallel deep learning model for efficient intrusion detection based on metric learning
Cai, Shaokang
Han, Dezhi
Yin, Xinming
Li, Dun
Chang, Chin-Chen
CONNECTION SCIENCE, 2022, 34 (01) : 551 - 577
[44] On-Device Deep Learning Inference for Efficient Activity Data Collection
Mairittha, Nattaya
Mairittha, Tittaya
Inoue, Sozo
SENSORS, 2019, 19 (15)
[45] Training Deep Face Recognition for Efficient Inference by Distillation and Mutual Learning
Shen, Guodong
Shen, Yao
RiaZ, M. Naveed
PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2018, : 38 - 43
[46] Progressive Transmission and Inference of Deep Learning Models deep learning model transmission, deep learning model deployment, deep learning application, progressive transmission, user experience
Lee, Youngsoo
Yun, Sangdoo
Kim, Yeonghun
Choi, Sunghee
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 271 - 277
[47] Toward Secure and Efficient Deep Learning Inference in Dependable IoT Systems
Qiu, Han
Zheng, Qinkai
Zhang, Tianwei
Qiu, Meikang
Memmi, Gerard
Lu, Jialiang
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (05) : 3180 - 3188
[48] EPIDL: Towards efficient and privacy-preserving inference in deep learning
Nie, Chenfei
Zhou, Zhipeng
Dong, Mianxiong
Ota, Kaoru
Li, Qiang
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (14):
[49] Memory-Efficient Deep Learning Inference in Trusted Execution Environments
Truong, Jean-Baptiste
Gallagher, William
Guo, Tian
Walls, Robert J.
2021 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E 2021, 2021, : 161 - 167
[50] Cost Efficient GPU Cluster Management for Training and Inference of Deep Learning
Kang, Dong-Ki
Lee, Ki-Beom
Kim, Young-Chon
ENERGIES, 2022, 15 (02)

← 1 2 3 4 5 →