Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

被引:20
|
作者
Calderon-Ramirez, Saul [1 ,2 ]
Yang, Shengxiang [1 ]
Moemeni, Armaghan [3 ]
Elizondo, David [1 ]
Colreavy-Donnelly, Simon [1 ]
Chavarria-Estrada, Luis Fernando [4 ]
Molina-Cabello, Miguel A. [5 ,6 ]
机构
[1] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
[2] Inst Tecnol Costa Rica, Cartago, Costa Rica
[3] Univ Nottingham, Sch Comp Sci, Nottingham, England
[4] Imagenes Med Dr Chavarria Estrada, San Jose, Costa Rica
[5] Univ Malaga, Dept Comp Languages & Comp Sci, Malaga, Spain
[6] Inst Invest Biomed Malaga IBIMA, Malaga, Spain
关键词
Coronavirus; COVID-19; Computer aided diagnosis; Data imbalance; Semi-supervised learning; DEEP; RADIOLOGY; FEATURES;
D O I
10.1016/j.asoc.2021.107692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] COVID-19 Detection Using Feature Extraction and Semi-Supervised Learning from Chest X-ray Images
    Haque, Samiul
    Hoque, Mohammad Akidul
    Khan, Mohammad Ariful Islam
    Ahmed, Sabbir
    2021 IEEE REGION 10 SYMPOSIUM (TENSYMP), 2021,
  • [2] Improving Uncertainty Estimation With Semi-Supervised Deep Learning for COVID-19 Detection Using Chest X-Ray Images
    Calderon-Ramirez, Saul
    Yang, Shengxiang
    Moemeni, Armaghan
    Colreavy-Donnelly, Simon
    Elizondo, David A.
    Oala, Luis
    Rodriguez-Capitan, Jorge
    Jimenez-Navarro, Manuel
    Lopez-Rubio, Ezequiel
    Molina-Cabello, Miguel A.
    IEEE ACCESS, 2021, 9 : 85442 - 85454
  • [3] Dealing with Scarce Labelled Data: Semi-supervised Deep Learning with Mix Match for Covid-19 Detection Using Chest X-ray Images
    Calderon-Ramirez, Saul
    Giri, Raghvendra
    Yang, Shengxiang
    Moemeni, Armaghan
    Umana, Mario
    Elizondo, David
    Torrents-Barrena, Jordina
    Molina-Cabello, Miguel A.
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5294 - 5301
  • [4] Multi-Feature Semi-Supervised Learning for COVID-19 Diagnosis from Chest X-Ray Images
    Qi, Xiao
    Foran, David J.
    Nosher, John L.
    Hacihaliloglu, Ilker
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2021, 2021, 12966 : 151 - 160
  • [5] Dealing with distribution mismatch in semi-supervised deep learning for COVID-19 detection using chest X-ray images: A novel approach using feature densities
    Calderon-Ramirez, Saul
    Yang, Shengxiang
    Elizondo, David
    Moemeni, Armaghan
    APPLIED SOFT COMPUTING, 2022, 123
  • [6] RELIABLE COVID-19 DETECTION USING CHEST X-RAY IMAGES
    Degerli, Aysen
    Ahishali, Mete
    Kiranyaz, Serkan
    Chowdhury, Muhammad E. H.
    Gabbouj, Moncef
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 185 - 189
  • [7] Potential diagnosis of COVID-19 from chest X-ray and CT findings using semi-supervised learning
    Sahoo, Pracheta
    Roy, Indranil
    Ahlawat, Randeep
    Irtiza, Saquib
    Khan, Latifur
    PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2022, 45 (01) : 31 - 42
  • [8] Potential diagnosis of COVID-19 from chest X-ray and CT findings using semi-supervised learning
    Pracheta Sahoo
    Indranil Roy
    Randeep Ahlawat
    Saquib Irtiza
    Latifur Khan
    Physical and Engineering Sciences in Medicine, 2022, 45 : 31 - 42
  • [9] MTSS-AAE: Multi-task semi-supervised adversarial autoencoding for COVID-19 detection based on chest X-ray images
    Ullah, Zahid
    Usman, Muhammad
    Gwak, Jeonghwan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 216
  • [10] DeepCOVNet Model for COVID-19 Detection Using Chest X-Ray Images
    Vandana Bhattacharjee
    Ankita Priya
    Nandini Kumari
    Shamama Anwar
    Wireless Personal Communications, 2023, 130 : 1399 - 1416