ENRICHing medical imaging training sets enables more efficient machine learning

被引:8
|
作者
Chinn, Erin [1 ]
Arora, Rohit [2 ]
Arnaout, Ramy [2 ,3 ]
Arnaout, Rima [1 ,4 ]
机构
[1] Univ Calif San Francisco, Bakar Computat Hlth Sci Inst, Dept Med, Dept Radiol,Div Cardiol, San Francisco, CA USA
[2] Beth Israel Deaconess Med Ctr, Dept Pathol, Div Clin Pathol, Boston, MA USA
[3] Beth Israel Deaconess Med Ctr, Dept Med, Div Clin Informat, Boston, MA USA
[4] Univ Calif San Francisco, Bakar Computat Hlth Sci Inst, Dept Radiol, Dept Med,Div Cardiol, 521 Parnassus Ave,Rm 6222, San Francisco, CA 94143 USA
关键词
deep learning; medical imaging; information theory; instance selection; data quality; data efficiency;
D O I
10.1093/jamia/ocad055
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Deep learning (DL) has been applied in proofs of concept across biomedical imaging, including across modalities and medical specialties. Labeled data are critical to training and testing DL models, but human expert labelers are limited. In addition, DL traditionally requires copious training data, which is computationally expensive to process and iterate over. Consequently, it is useful to prioritize using those images that are most likely to improve a model's performance, a practice known as instance selection. The challenge is determining how best to prioritize. It is natural to prefer straightforward, robust, quantitative metrics as the basis for prioritization for instance selection. However, in current practice, such metrics are not tailored to, and almost never used for, image datasets. Materials and Methods To address this problem, we introduce ENRICH-Eliminate Noise and Redundancy for Imaging Challenges-a customizable method that prioritizes images based on how much diversity each image adds to the training set. Results First, we show that medical datasets are special in that in general each image adds less diversity than in nonmedical datasets. Next, we demonstrate that ENRICH achieves nearly maximal performance on classification and segmentation tasks on several medical image datasets using only a fraction of the available images and without up-front data labeling. ENRICH outperforms random image selection, the negative control. Finally, we show that ENRICH can also be used to identify errors and outliers in imaging datasets. Conclusions ENRICH is a simple, computationally efficient method for prioritizing images for expert labeling and use in DL.
引用
收藏
页码:1079 / 1090
页数:12
相关论文
共 50 条
  • [1] Machine Learning in Medical Imaging
    Giger, Maryellen L.
    JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2018, 15 (03) : 512 - 520
  • [2] Machine learning in medical imaging
    Shen, Dinggang
    Wu, Guorong
    Zhang, Daoqiang
    Suzuki, Kenji
    Wang, Fei
    Yan, Pingkun
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2015, 41 : 1 - 2
  • [3] Machine learning in medical imaging
    Pingkun Yan
    Kenji Suzuki
    Fei Wang
    Dinggang Shen
    Machine Vision and Applications, 2013, 24 : 1327 - 1329
  • [4] Machine Learning for Medical Imaging
    Fu, Geng-Shen
    Levin-Schwartz, Yuri
    Lin, Qiu-Hua
    Zhang, Da
    JOURNAL OF HEALTHCARE ENGINEERING, 2019, 2019
  • [5] Machine Learning in Medical Imaging
    Suzuki, Kenji
    Yan, Pingkun
    Wang, Fei
    Shen, Dinggang
    INTERNATIONAL JOURNAL OF BIOMEDICAL IMAGING, 2012, 2012
  • [6] Machine Learning in Medical Imaging
    Wernick, Miles N.
    Yang, Yongyi
    Brankov, Jovan G.
    Yourganov, Grigori
    Strother, Stephen C.
    IEEE SIGNAL PROCESSING MAGAZINE, 2010, 27 (04) : 25 - 38
  • [7] Machine learning in medical imaging
    Yan, Pingkun
    Suzuki, Kenji
    Wang, Fei
    Shen, Dinggang
    MACHINE VISION AND APPLICATIONS, 2013, 24 (07) : 1327 - 1329
  • [8] Applications and training sets of machine learning potentials
    Hong, Changho
    Kim, Jaehoon
    Kim, Jaesun
    Jung, Jisu
    Ju, Suyeon
    Choi, Jeong Min
    Han, Seungwu
    SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS-METHODS, 2023, 3 (01):
  • [9] Informed training set design enables efficient machine learning-assisted directed protein evolution
    Wittmann, Bruce J.
    Yue, Yisong
    Arnold, Frances H.
    CELL SYSTEMS, 2021, 12 (11) : 1026 - +
  • [10] Machine Learning and Deep Learning in Medical Imaging: Intelligent Imaging
    Currie, Geoff
    Hawk, K. Elizabeth
    Rohren, Eric
    Vial, Alanna
    Klein, Ran
    JOURNAL OF MEDICAL IMAGING AND RADIATION SCIENCES, 2019, 50 (04) : 477 - 487