Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

被引：3

作者：

Shafique, Muhammad Ali ^{[1
]}

Munir, Arslan ^{[2
]}

Kong, Joonho ^{[3
]}

机构：

[1] Kansas State Univ, Dept Elect & Comp Engn, Manhattan, KS 66506 USA

[2] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA

[3] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea

来源：

AI | 2023年 / 4卷 / 04期

关键词：

optimization; deep learning; quantization; performance; TensorRT; automatic mixed precision;

D O I：

10.3390/ai4040047

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.

引用

页码：926 / 948

页数：23

共 50 条

[41] Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks
Aach, Marcel
Inanc, Eray
Sarma, Rakesh
Riedel, Morris
Lintermann, Andreas
JOURNAL OF BIG DATA, 2023, 10 (01)
[42] Single node deep learning frameworks: Comparative study and CPU/GPU performance analysis
Lerat, Jean-Sebastien
Mahmoudi, Sidi Ahmed
Mahmoudi, Said
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (14):
[43] I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning
Chowdhury, Fahim
Zhu, Yue
Heer, Todd
Paredes, Saul
Moody, Adam
Goldstone, Robin
Mohror, Kathryn
Yu, Weikuan
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
[44] Deep Learning Frameworks on Apache Spark: A Review
Venkatesan, Nikitha Johnsirani
Nam, ChoonSung
Shin, Dong Ryeol
IETE TECHNICAL REVIEW, 2019, 36 (02) : 164 - 177
[45] Exploiting Parallelism Opportunities with Deep Learning Frameworks
Wang, Yu Emma
Wu, Carole-Jean
Wang, Xiaodong
Hazelwood, Kim
Brooks, David
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (01)
[46] Experimental Characterizations and Analysis of Deep Learning Frameworks
Wu, Yanzhao
Cao, Wenqi
Sahin, Semih
Liu, Ling
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 372 - 377
[47] DLBench: An Experimental Evaluation of Deep Learning Frameworks
Mahmoud, Nesma
Essam, Youssef
Elshawi, Radwa
Sakr, Sherif
2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 149 - 156
[48] Improving the Expressiveness of Deep Learning Frameworks with Recursion
Jeong, Eunji
Jeong, Joo Seong
Kim, Soojeong
Yu, Gyeong-In
Chun, Byung-Gon
EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE, 2018,
[49] ADELT: Transpilation between Deep Learning Frameworks
Gong, Linyuan
Wang, Jiayi
Cheung, Alvin
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 6279 - 6287
[50] AUDEE: Automated Testing for Deep Learning Frameworks
Guo, Qianyu
Xie, Xiaofei
Li, Yi
Zhang, Xiaoyu
Liu, Yang
Li, Xiaohong
Shen, Chao
2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 486 - 498

← 1 2 3 4 5 →