Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability

被引:0
|
作者
Gomez-Martinez, Vanesa [1 ]
Chushig-Muzo, David [1 ]
Veierod, Marit B. [2 ]
Granja, Conceicao [3 ]
Soguero-Ruiz, Cristina [1 ]
机构
[1] Rey Juan Carlos Univ, Dept Signal Theory & Commun Telematics & Comp Syst, Madrid 28943, Spain
[2] Univ Oslo, Inst Basic Med Sci, Oslo Ctr Biostat & Epidemiol, Dept Biostat, Oslo, Norway
[3] Univ Hosp North Norway, Norwegian Ctr Ehlth Res, N-9019 Tromso, Norway
来源
BIODATA MINING | 2024年 / 17卷 / 01期
基金
欧盟地平线“2020”;
关键词
Melanoma classification; Skin lesion classification; Ensemble feature selection; Tabular generative adversarial networks; Class imbalance; Interpretability methods; TEXTURAL FEATURES; ABCD RULE; CLASSIFICATION; DIAGNOSIS; MATRICES;
D O I
10.1186/s13040-024-00397-7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
BackgroundCutaneous melanoma is the most aggressive form of skin cancer, responsible for most skin cancer-related deaths. Recent advances in artificial intelligence, jointly with the availability of public dermoscopy image datasets, have allowed to assist dermatologists in melanoma identification. While image feature extraction holds potential for melanoma detection, it often leads to high-dimensional data. Furthermore, most image datasets present the class imbalance problem, where a few classes have numerous samples, whereas others are under-represented.MethodsIn this paper, we propose to combine ensemble feature selection (FS) methods and data augmentation with the conditional tabular generative adversarial networks (CTGAN) to enhance melanoma identification in imbalanced datasets. We employed dermoscopy images from two public datasets, PH2 and Derm7pt, which contain melanoma and not-melanoma lesions. To capture intrinsic information from skin lesions, we conduct two feature extraction (FE) approaches, including handcrafted and embedding features. For the former, color, geometric and first-, second-, and higher-order texture features were extracted, whereas for the latter, embeddings were obtained using ResNet-based models. To alleviate the high-dimensionality in the FE, ensemble FS with filter methods were used and evaluated. For data augmentation, we conducted a progressive analysis of the imbalance ratio (IR), related to the amount of synthetic samples created, and evaluated the impact on the predictive results. To gain interpretability on predictive models, we used SHAP, bootstrap resampling statistical tests and UMAP visualizations.ResultsThe combination of ensemble FS, CTGAN, and linear models achieved the best predictive results, achieving AUCROC values of 87% (with support vector machine and IR=0.9) and 76% (with LASSO and IR=1.0) for the PH2 and Derm7pt, respectively. We also identified that melanoma lesions were mainly characterized by features related to color, while not-melanoma lesions were characterized by texture features.ConclusionsOur results demonstrate the effectiveness of ensemble FS and synthetic data in the development of models that accurately identify melanoma. This research advances skin lesion analysis, contributing to both melanoma detection and the interpretation of main features for its identification.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Generative Adversarial Networks for Bitcoin Data Augmentation
    Zola, Francesco
    Lukas Bruse, Jan
    Etxeberria Barrio, Xabier
    Galar, Mikel
    Orduna Urrutia, Raul
    2020 2ND CONFERENCE ON BLOCKCHAIN RESEARCH & APPLICATIONS FOR INNOVATIVE NETWORKS AND SERVICES (BRAINS), 2020, : 136 - 143
  • [2] Data Augmentation with Improved Generative Adversarial Networks
    Shi, Hongjiang
    Wang, Lu
    Ding, Guangtai
    Yang, Fenglei
    Li, Xiaoqiang
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 73 - 78
  • [3] Data Augmentation Powered by Generative Adversarial Networks
    Poka, Karoly Bence
    Szemenyei, Marton
    2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
  • [4] Conditional Generative Adversarial Networks with Adversarial Attack and Defense for Generative Data Augmentation
    Baek, Francis
    Kim, Daeho
    Park, Somin
    Kim, Hyoungkwan
    Lee, SangHyun
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2022, 36 (03)
  • [5] TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks
    Rajabi, Amirarsalan
    Garibay, Ozlem Ozmen
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (02): : 488 - 501
  • [6] Data Augmentation of a Corrosion Dataset for Defect Growth Prediction of Pipelines Using Conditional Tabular Generative Adversarial Networks
    Ma, Haonan
    Geng, Mengying
    Wang, Fan
    Zheng, Wenyue
    Ai, Yibo
    Zhang, Weidong
    MATERIALS, 2024, 17 (05)
  • [7] Biosignal Data Augmentation Based on Generative Adversarial Networks
    Harada, Shota
    Hayashi, Hideaki
    Uchida, Seiichi
    2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 368 - 371
  • [8] Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks
    Nik, Alireza Hossein Zadeh
    Riegler, Michael A.
    Halvorsen, Pal
    Storas, Andrea M.
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 434 - 446
  • [9] Tabular data synthesis with generative adversarial networks: design space and optimizations
    Liu, Tongyu
    Fan, Ju
    Li, Guoliang
    Tang, Nan
    Du, Xiaoyong
    VLDB JOURNAL, 2024, 33 (02): : 255 - 280
  • [10] Tabular data synthesis with generative adversarial networks: design space and optimizations
    Tongyu Liu
    Ju Fan
    Guoliang Li
    Nan Tang
    Xiaoyong Du
    The VLDB Journal, 2024, 33 : 255 - 280