An Empirical Study of Challenges in Converting Deep Learning Models

被引：9

作者：

Openja, Moses ^{[1
]}

Nikanjam, Amin ^{[1
]}

Yahmed, Ahmed Haj ^{[1
]}

Khomh, Foutse ^{[1
]}

Jiang, Zhen Ming ^{[2
]}

机构：

[1] Polytech Montreal, Montreal, PQ, Canada

[2] York Univ, Toronto, ON, Canada

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022) | 2022年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Empirical; Deep Learning; Converting Trained Models; Deploying ML Models; Robustness;

D O I：

10.1109/ICSME55016.2022.00010

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually, DL models are developed and trained using DL frameworks like TensorFlow and PyTorch. Each framework has its own internal mechanisms/formats to represent and train DL models (deep neural networks), and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where they were developed. To solve the interoperability issue and make DL models compatible with different frameworks/environments, some exchange formats are introduced for DL models, like ONNX and CoreML. However, ONNX and CoreML were never empirically evaluated by the community to reveal their prediction accuracy, performance, and robustness after conversion. Poor accuracy or non-robust behavior of converted models may lead to poor quality of deployed DL-based software systems. We conduct, in this paper, the first empirical study to assess ONNX and CoreML for converting trained DL models. In our systematic approach, two popular DL frameworks, Keras and PyTorch, are used to train five widely used DL models on three popular datasets. The trained models are then converted to ONNX and CoreML and transferred to two runtime environments designated for such formats, to be evaluated. We investigate the prediction accuracy before and after conversion. Our results unveil that the prediction accuracy of converted models are at the same level of originals. The performance (time cost and memory consumption) of converted models are studied as well. The size of models are reduced after conversion, which can result in optimized DL-based software deployment. We also study the adversarial robustness of converted models to make sure about the robustness of deployed DL-based software. Leveraging the state-of-the-art adversarial attack approaches, converted models are generally assessed robust at the same level of originals. However, obtained results show that CoreML models are more vulnerable to adversarial attacks compared to ONNX. The general message of our findings is that DL developers should be cautious on the deployment of converted models that may 1) perform poorly while switching from one framework to another, 2) have challenges in robust deployment, or 3) run slowly, leading to poor quality of deployed DL-based software, including DL-based software maintenance tasks, like bug prediction.

引用

页码：13 / 23

页数：11

共 50 条

[21] The state of the art of deep learning models in medical science and their challenges
Bhatt, Chandradeep
Kumar, Indrajeet
Vijayakumar, V.
Singh, Kamred Udham
Kumar, Abhishek
[J]. MULTIMEDIA SYSTEMS, 2021, 27 (04) : 599 - 613
[22] The state of the art of deep learning models in medical science and their challenges
Chandradeep Bhatt
Indrajeet Kumar
V. Vijayakumar
Kamred Udham Singh
Abhishek Kumar
[J]. Multimedia Systems, 2021, 27 : 599 - 613
[23] Deep learning: systematic review, models, challenges, and research directions
Talaei Khoei, Tala
Ould Slimane, Hadjar
Kaabouch, Naima
[J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (31): : 23103 - 23124
[24] Deep learning: systematic review, models, challenges, and research directions
Tala Talaei Khoei
Hadjar Ould Slimane
Naima Kaabouch
[J]. Neural Computing and Applications, 2023, 35 : 23103 - 23124
[25] Bias and Generalization in Deep Generative Models: An Empirical Study
Zhao, Shengjia
Ren, Hongyu
Yuan, Arianna
Song, Jiaming
Goodman, Noah
Ermon, Stefano
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[26] A Study of Speaking Learning in on Deep Learning Assessment Models
Sun, Can
Abd Mahid, Faizah
[J]. JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (09) : 712 - 719
[27] Shallow or Deep? An Empirical Study on Detecting Vulnerabilities using Deep Learning
Mazuera-Rozo, Alejandro
Mojica-Hanke, Anamaria
Linares-Vasquez, Mario
Bavota, Gabriele
[J]. 2021 IEEE/ACM 29TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2021), 2021, : 276 - 287
[28] An Empirical Study of the Dependency Networks of Deep Learning Libraries
Han, Junxiao
Deng, Shuiguang
Lo, David
Zhi, Chen
Yin, Jianwei
Xia, Xin
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020), 2020, : 868 - 878
[29] An Empirical Study on Quality Issues of Deep Learning Platform
Gao, Yanjie
Shi, Xiaoxiang
Lin, Haoxiang
Zhang, Hongyu
Wu, Hao
Li, Rui
Yang, Mao
[J]. 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP, 2023, : 455 - 466
[30] An Empirical Study of Fault Triggers in Deep Learning Frameworks
Du, Xiaoting
Sui, Yulei
Liu, Zhihao
Ai, Jun
[J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (04) : 2696 - 2712

← 1 2 3 4 5 →