Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

被引:3
|
作者
Shafique, Muhammad Ali [1 ]
Munir, Arslan [2 ]
Kong, Joonho [3 ]
机构
[1] Kansas State Univ, Dept Elect & Comp Engn, Manhattan, KS 66506 USA
[2] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
[3] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
关键词
optimization; deep learning; quantization; performance; TensorRT; automatic mixed precision;
D O I
10.3390/ai4040047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.
引用
收藏
页码:926 / 948
页数:23
相关论文
共 50 条
  • [41] Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks
    Aach, Marcel
    Inanc, Eray
    Sarma, Rakesh
    Riedel, Morris
    Lintermann, Andreas
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [42] Single node deep learning frameworks: Comparative study and CPU/GPU performance analysis
    Lerat, Jean-Sebastien
    Mahmoudi, Sidi Ahmed
    Mahmoudi, Said
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (14):
  • [43] I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning
    Chowdhury, Fahim
    Zhu, Yue
    Heer, Todd
    Paredes, Saul
    Moody, Adam
    Goldstone, Robin
    Mohror, Kathryn
    Yu, Weikuan
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [44] Deep Learning Frameworks on Apache Spark: A Review
    Venkatesan, Nikitha Johnsirani
    Nam, ChoonSung
    Shin, Dong Ryeol
    IETE TECHNICAL REVIEW, 2019, 36 (02) : 164 - 177
  • [45] Exploiting Parallelism Opportunities with Deep Learning Frameworks
    Wang, Yu Emma
    Wu, Carole-Jean
    Wang, Xiaodong
    Hazelwood, Kim
    Brooks, David
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (01)
  • [46] Experimental Characterizations and Analysis of Deep Learning Frameworks
    Wu, Yanzhao
    Cao, Wenqi
    Sahin, Semih
    Liu, Ling
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 372 - 377
  • [47] DLBench: An Experimental Evaluation of Deep Learning Frameworks
    Mahmoud, Nesma
    Essam, Youssef
    Elshawi, Radwa
    Sakr, Sherif
    2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 149 - 156
  • [48] Improving the Expressiveness of Deep Learning Frameworks with Recursion
    Jeong, Eunji
    Jeong, Joo Seong
    Kim, Soojeong
    Yu, Gyeong-In
    Chun, Byung-Gon
    EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE, 2018,
  • [49] ADELT: Transpilation between Deep Learning Frameworks
    Gong, Linyuan
    Wang, Jiayi
    Cheung, Alvin
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 6279 - 6287
  • [50] AUDEE: Automated Testing for Deep Learning Frameworks
    Guo, Qianyu
    Xie, Xiaofei
    Li, Yi
    Zhang, Xiaoyu
    Liu, Yang
    Li, Xiaohong
    Shen, Chao
    2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 486 - 498