An Empirical Study of Challenges in Converting Deep Learning Models

被引:9
|
作者
Openja, Moses [1 ]
Nikanjam, Amin [1 ]
Yahmed, Ahmed Haj [1 ]
Khomh, Foutse [1 ]
Jiang, Zhen Ming [2 ]
机构
[1] Polytech Montreal, Montreal, PQ, Canada
[2] York Univ, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Empirical; Deep Learning; Converting Trained Models; Deploying ML Models; Robustness;
D O I
10.1109/ICSME55016.2022.00010
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually, DL models are developed and trained using DL frameworks like TensorFlow and PyTorch. Each framework has its own internal mechanisms/formats to represent and train DL models (deep neural networks), and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where they were developed. To solve the interoperability issue and make DL models compatible with different frameworks/environments, some exchange formats are introduced for DL models, like ONNX and CoreML. However, ONNX and CoreML were never empirically evaluated by the community to reveal their prediction accuracy, performance, and robustness after conversion. Poor accuracy or non-robust behavior of converted models may lead to poor quality of deployed DL-based software systems. We conduct, in this paper, the first empirical study to assess ONNX and CoreML for converting trained DL models. In our systematic approach, two popular DL frameworks, Keras and PyTorch, are used to train five widely used DL models on three popular datasets. The trained models are then converted to ONNX and CoreML and transferred to two runtime environments designated for such formats, to be evaluated. We investigate the prediction accuracy before and after conversion. Our results unveil that the prediction accuracy of converted models are at the same level of originals. The performance (time cost and memory consumption) of converted models are studied as well. The size of models are reduced after conversion, which can result in optimized DL-based software deployment. We also study the adversarial robustness of converted models to make sure about the robustness of deployed DL-based software. Leveraging the state-of-the-art adversarial attack approaches, converted models are generally assessed robust at the same level of originals. However, obtained results show that CoreML models are more vulnerable to adversarial attacks compared to ONNX. The general message of our findings is that DL developers should be cautious on the deployment of converted models that may 1) perform poorly while switching from one framework to another, 2) have challenges in robust deployment, or 3) run slowly, leading to poor quality of deployed DL-based software, including DL-based software maintenance tasks, like bug prediction.
引用
收藏
页码:13 / 23
页数:11
相关论文
共 50 条
  • [21] The state of the art of deep learning models in medical science and their challenges
    Bhatt, Chandradeep
    Kumar, Indrajeet
    Vijayakumar, V.
    Singh, Kamred Udham
    Kumar, Abhishek
    [J]. MULTIMEDIA SYSTEMS, 2021, 27 (04) : 599 - 613
  • [22] The state of the art of deep learning models in medical science and their challenges
    Chandradeep Bhatt
    Indrajeet Kumar
    V. Vijayakumar
    Kamred Udham Singh
    Abhishek Kumar
    [J]. Multimedia Systems, 2021, 27 : 599 - 613
  • [23] Deep learning: systematic review, models, challenges, and research directions
    Talaei Khoei, Tala
    Ould Slimane, Hadjar
    Kaabouch, Naima
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (31): : 23103 - 23124
  • [24] Deep learning: systematic review, models, challenges, and research directions
    Tala Talaei Khoei
    Hadjar Ould Slimane
    Naima Kaabouch
    [J]. Neural Computing and Applications, 2023, 35 : 23103 - 23124
  • [25] Bias and Generalization in Deep Generative Models: An Empirical Study
    Zhao, Shengjia
    Ren, Hongyu
    Yuan, Arianna
    Song, Jiaming
    Goodman, Noah
    Ermon, Stefano
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [26] A Study of Speaking Learning in on Deep Learning Assessment Models
    Sun, Can
    Abd Mahid, Faizah
    [J]. JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (09) : 712 - 719
  • [27] Shallow or Deep? An Empirical Study on Detecting Vulnerabilities using Deep Learning
    Mazuera-Rozo, Alejandro
    Mojica-Hanke, Anamaria
    Linares-Vasquez, Mario
    Bavota, Gabriele
    [J]. 2021 IEEE/ACM 29TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2021), 2021, : 276 - 287
  • [28] An Empirical Study of the Dependency Networks of Deep Learning Libraries
    Han, Junxiao
    Deng, Shuiguang
    Lo, David
    Zhi, Chen
    Yin, Jianwei
    Xia, Xin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020), 2020, : 868 - 878
  • [29] An Empirical Study on Quality Issues of Deep Learning Platform
    Gao, Yanjie
    Shi, Xiaoxiang
    Lin, Haoxiang
    Zhang, Hongyu
    Wu, Hao
    Li, Rui
    Yang, Mao
    [J]. 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP, 2023, : 455 - 466
  • [30] An Empirical Study of Fault Triggers in Deep Learning Frameworks
    Du, Xiaoting
    Sui, Yulei
    Liu, Zhihao
    Ai, Jun
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (04) : 2696 - 2712