Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

被引：3

作者：

Shafique, Muhammad Ali ^{[1
]}

Munir, Arslan ^{[2
]}

Kong, Joonho ^{[3
]}

机构：

[1] Kansas State Univ, Dept Elect & Comp Engn, Manhattan, KS 66506 USA

[2] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA

[3] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea

来源：

AI | 2023年 / 4卷 / 04期

关键词：

optimization; deep learning; quantization; performance; TensorRT; automatic mixed precision;

D O I：

10.3390/ai4040047

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.

引用

页码：926 / 948

页数：23

共 50 条

[21] A Deep Learning Based Intrusion Detection System on GPUs
Karatas, Gozde
Demir, Onder
Sahingoz, Ozgur Koray
PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI-2019), 2019,
[22] Detailed Characterization of Deep Neural Networks on GPUs and FPGAs
Karki, Aajna
Keshava, Chethan Palangotu
Shivakumar, Spoorthi Mysore
Skow, Joshua
Hegde, Goutam Madhukeshwar
Jeon, Hyeran
12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), 2019, : 12 - 21
[23] Performance Characterization of Mobile GP-GPUs
Andargie, Fitsum Assamnew
Rose, Jonathan
PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,
[24] Performance of Training Sparse Deep Neural Networks on GPUs
Wang, Jianzong
Huang, Zhangcheng
Kong, Lingwei
Xiao, Jing
Wang, Pengyu
Zhang, Lu
Li, Chao
2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
[25] Performance Analysis of Distributed Deep Learning Frameworks in a Multi-GPU Environment
Kavarakuntla, Tulasi
Han, Liangxiu
Lloyd, Huw
Latham, Annabel
Akintoye, Samson B.
20TH INT CONF ON UBIQUITOUS COMP AND COMMUNICAT (IUCC) / 20TH INT CONF ON COMP AND INFORMATION TECHNOLOGY (CIT) / 4TH INT CONF ON DATA SCIENCE AND COMPUTATIONAL INTELLIGENCE (DSCI) / 11TH INT CONF ON SMART COMPUTING, NETWORKING, AND SERV (SMARTCNS), 2021, : 406 - 413
[26] Object Storage for Deep Learning Frameworks
Ozeri, Or
Ofer, Effi
Kat, Ronen
DIDL'18: PROCEEDINGS OF THE SECOND WORKSHOP ON DISTRIBUTED INFRASTRUCTURES FOR DEEP LEARNING, 2018, : 21 - 24
[27] A Survey of Deep-learning Frameworks
Parvat, Aniruddha
Chavan, Jai
Kadam, Siddhesh
Dev, Souradeep
Pathak, Vidhi
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2017), 2017, : 211 - 217
[28] Survey on Testing of Deep Learning Frameworks
Ma, Xiang-Yue
Du, Xiao-Ting
Cai, Qing
Zheng, Yang
Hu, Zheng
Zheng, Zheng
Ruan Jian Xue Bao/Journal of Software, 2024, 35 (08): : 3752 - 3784
[29] Scalable Deep Learning-Based Microarchitecture Simulation on GPUs
Pandey, Santosh
Li, Lingda
Flynn, Thomas
Hoisie, Adolfy
Liu, Hang
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[30] Fast Training of Deep Learning Models over Multiple GPUs
Yi, Xiaodong
Luo, Ziyue
Meng, Chen
Wang, Mengdi
Long, Guoping
Wu, Chuan
Yang, Jun
Lin, Wei
PROCEEDINGS OF THE 2020 21ST INTERNATIONAL MIDDLEWARE CONFERENCE (MIDDLEWARE '20), 2020, : 105 - 118

← 1 2 3 4 5 →