Reproducibility of Training Deep Learning Models for Medical Image Analysis

被引:0
|
作者
Bosma, Joeran Sander [1 ]
Peeters, Dre [1 ]
Alves, Natalia [1 ]
Saha, Anindo [1 ]
Saghir, Zaigham [2 ]
Jacobs, Colin [1 ]
Huisman, Henkjan [1 ]
机构
[1] Radboud Univ Nijmegen, Ctr Med, Diagnost Image Anal Grp, Dept Med Imaging, NL-6525 GA Nijmegen, Netherlands
[2] Herlev Gentofte Hosp, Sect Pulm Med, Dept Med, Hellerup, Denmark
关键词
Deep learning; reproducibility; medical image analysis; performance variance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Performance of deep learning algorithms varies due to their development data and training method, but also due to several stochastic processes during training. Due to these random factors, a single training run may not accurately reflect the performance of a given training method. Statistical comparisons in literature between different deep learning training methods typically ignore this performance variation between training runs and incorrectly claim significance of changes in training method. We hypothesize that the impact of such performance variation is substantial, such that it may invalidate biomedical competition leaderboards and some scientific papers. To test this, we investigate the reproducibility of training deep learning algorithms for medical image analysis. We repeated training runs from prior scientific studies: three diagnostic tasks (pancreatic cancer detection in CT, clinically significant prostate cancer detection in MRI, and lung nodule malignancy risk estimation in low-dose CT) and two organ segmentation tasks (pancreas segmentation in CT and prostate segmentation in MRI). A previously published top-performing algorithm for each task was trained multiple times to determine the variance in model performance. For all three diagnostic algorithms, performance variation from retraining was significant compared to data variance. Statistically comparing independently trained algorithms from the same training method using the same dataset should follow the null hypothesis, but we observed claimed significance with a p-value below 0.05 in 15% of comparisons with conventional testing (paired bootstrapping). We conclude that variance in model performance due to retraining is substantial and should be accounted for.
引用
收藏
页码:1269 / 1287
页数:19
相关论文
共 50 条
  • [31] Editorial: Deep learning for medical image analysis
    Lu, Ke
    Wang, Fei
    Shao, Ling
    Li, Weisheng
    NEUROCOMPUTING, 2020, 392 : 121 - 123
  • [32] MEDICAL IMAGE ANALYSIS BASED ON DEEP LEARNING
    Dong, S.
    Wang, P.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 122 : 66 - 66
  • [33] Checklist for Reproducibility of Deep Learning in Medical Imaging
    Moassefi, Mana
    Singh, Yashbir
    Conte, Gian Marco
    Khosravi, Bardia
    Rouzrokh, Pouria
    Vahdati, Sanaz
    Safdar, Nabile
    Moy, Linda
    Kitamura, Felipe
    Gentili, Amilcare
    Lakhani, Paras
    Kottler, Nina
    Halabi, Safwan S.
    Yacoub, Joseph H.
    Hou, Yuankai
    Younis, Khaled
    Erickson, Bradley J.
    Krupinski, Elizabeth
    Faghani, Shahriar
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (04): : 1664 - 1673
  • [34] Progressive Deep Learning: An Accelerated Training Strategy for Medical Image Segmentation
    Choi, B.
    Chun, J.
    Olberg, S.
    Park, I.
    Li, H.
    Kim, J.
    Mutic, S.
    Park, J.
    MEDICAL PHYSICS, 2020, 47 (06) : E434 - E434
  • [35] Deep Learning in Medical Ultrasound Image Analysis: A Review
    Wang, Yu
    Ge, Xinke
    Ma, He
    Qi, Shouliang
    Zhang, Guanjing
    Yao, Yudong
    IEEE ACCESS, 2021, 9 : 54310 - 54324
  • [36] Guest editorial: Deep learning for medical image analysis
    Li, Hongsheng
    Zhang, Shaoting
    Metaxas, Dimitris N.
    NEUROCOMPUTING, 2021, 438 : 209 - 210
  • [37] Deep learning for medical image analysis: a brief introduction
    Wiestler, Benedikt
    Menze, Bjoern
    NEURO-ONCOLOGY ADVANCES, 2020, 2 (SUPP 4) : 35 - 41
  • [38] A configurable deep learning framework for medical image analysis
    Chen, Jianguo
    Yang, Nan
    Zhou, Mimi
    Zhang, Zhaolei
    Yang, Xulei
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (10): : 7375 - 7392
  • [39] Trends in Deep Learning for Medical Hyperspectral Image Analysis
    Khan, Uzair
    Paheding, Sidike
    Elkin, Colin P.
    Devabhaktuni, Vijaya Kumar
    IEEE ACCESS, 2021, 9 (09): : 79534 - 79548
  • [40] Deep Learning and Vision Transformer for Medical Image Analysis
    Zhang, Yudong
    Wang, Jiaji
    Gorriz, Juan Manuel
    Wang, Shuihua
    JOURNAL OF IMAGING, 2023, 9 (07)