Reproducibility of Training Deep Learning Models for Medical Image Analysis

被引：0

作者：

Bosma, Joeran Sander ^{[1
]}

Peeters, Dre ^{[1
]}

Alves, Natalia ^{[1
]}

Saha, Anindo ^{[1
]}

Saghir, Zaigham ^{[2
]}

Jacobs, Colin ^{[1
]}

Huisman, Henkjan ^{[1
]}

机构：

[1] Radboud Univ Nijmegen, Ctr Med, Diagnost Image Anal Grp, Dept Med Imaging, NL-6525 GA Nijmegen, Netherlands

[2] Herlev Gentofte Hosp, Sect Pulm Med, Dept Med, Hellerup, Denmark

来源：

MEDICAL IMAGING WITH DEEP LEARNING, VOL 227 | 2023年 / 227卷

关键词：

Deep learning; reproducibility; medical image analysis; performance variance;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Performance of deep learning algorithms varies due to their development data and training method, but also due to several stochastic processes during training. Due to these random factors, a single training run may not accurately reflect the performance of a given training method. Statistical comparisons in literature between different deep learning training methods typically ignore this performance variation between training runs and incorrectly claim significance of changes in training method. We hypothesize that the impact of such performance variation is substantial, such that it may invalidate biomedical competition leaderboards and some scientific papers. To test this, we investigate the reproducibility of training deep learning algorithms for medical image analysis. We repeated training runs from prior scientific studies: three diagnostic tasks (pancreatic cancer detection in CT, clinically significant prostate cancer detection in MRI, and lung nodule malignancy risk estimation in low-dose CT) and two organ segmentation tasks (pancreas segmentation in CT and prostate segmentation in MRI). A previously published top-performing algorithm for each task was trained multiple times to determine the variance in model performance. For all three diagnostic algorithms, performance variation from retraining was significant compared to data variance. Statistically comparing independently trained algorithms from the same training method using the same dataset should follow the null hypothesis, but we observed claimed significance with a p-value below 0.05 in 15% of comparisons with conventional testing (paired bootstrapping). We conclude that variance in model performance due to retraining is substantial and should be accounted for.

引用

页码：1269 / 1287

页数：19

共 50 条

[21] Comparison of Deep Learning Image-to-image Models for Medical Image Translation
Yang, Zeyu
Zoellner, Frank G.
BILDVERARBEITUNG FUR DIE MEDIZIN 2024, 2024, : 344 - 349
[22] Deep Learning in Multimodal Medical Image Analysis
Xu, Yan
HEALTH INFORMATION SCIENCE, HIS 2019, 2019, 11837 : 193 - 200
[23] A review on deep learning in medical image analysis
S. Suganyadevi
V. Seethalakshmi
K. Balasamy
International Journal of Multimedia Information Retrieval, 2022, 11 : 19 - 38
[24] A review on deep learning in medical image analysis
Suganyadevi, S.
Seethalakshmi, V
Balasamy, K.
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (01) : 19 - 38
[25] A Review of Deep Learning on Medical Image Analysis
Jian Wang
Hengde Zhu
Shui-Hua Wang
Yu-Dong Zhang
Mobile Networks and Applications, 2021, 26 : 351 - 380
[26] A Review of Deep Learning on Medical Image Analysis
Wang, Jian
Zhu, Hengde
Wang, Shui-Hua
Zhang, Yu-Dong
Mobile Networks and Applications, 2021, 26 (01) : 351 - 380
[27] Deep Learning Applications in Medical Image Analysis
Ker, Justin
Wang, Lipo
Rao, Jai
Lim, Tchoyoson
IEEE ACCESS, 2018, 6 : 9375 - 9389
[28] A Review of Deep Learning on Medical Image Analysis
Wang, Jian
Zhu, Hengde
Wang, Shui-Hua
Zhang, Yu-Dong
MOBILE NETWORKS & APPLICATIONS, 2021, 26 (01): : 351 - 380
[29] Deep Learning Approach for Medical Image Analysis
Adegun, Adekanmi Adeyinka
Viriri, Serestina
Ogundokun, Roseline Oluwaseun
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[30] A survey on deep learning in medical image analysis
Litjens, Geert
Kooi, Thijs
Bejnordi, Babak Ehteshami
Setio, Arnaud Arindra Adiyoso
Ciompi, Francesco
Ghafoorian, Mohsen
van der Laak, Jeroen A. W. M.
van Ginneken, Bram
Sanchez, Clara I.
MEDICAL IMAGE ANALYSIS, 2017, 42 : 60 - 88

← 1 2 3 4 5 →