Development of Symbolic Expressions Ensemble for Breast Cancer Type Classification Using Genetic Programming Symbolic Classifier and Decision Tree Classifier

被引:6
|
作者
Andelic, Nikola [1 ]
Baressi Segota, Sandi [1 ]
机构
[1] Univ Rijeka, Fac Engn, Dept Automat & Elect, Vukovarska 58, Rijeka 51000, Croatia
关键词
breast cancer; genetic programming symbolic classifier; 5-fold cross validation; random hyperparameter value search; FEATURE-SELECTION ALGORITHM; OPTIMIZATION;
D O I
10.3390/cancers15133411
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Simple Summary Breast cancer is a type of cancer with several sub-types and correct sub-type classification based on a large number of gene expressions is challenging even for artificial intelligence. However, the accurate classification of breast cancer in a patient is mandatory for the application of proper treatment. To obtain the equations that can be used for accurate classification of breast cancer sub-type the genetic programming symbolic classifier was utilized. A large number of input variables (gene expressions) were reduced using principle component analysis and the imbalance between class samples was solved using different oversampling methods. The proposed procedure generated equations that can classify breast cancer sub-types with high classification accuracy which was slightly improved with the application of the decision tree classifier method. Breast cancer is a type of cancer with several sub-types. It occurs when cells in breast tissue grow out of control. The accurate sub-type classification of a patient diagnosed with breast cancer is mandatory for the application of proper treatment. Breast cancer classification based on gene expression is challenging even for artificial intelligence (AI) due to the large number of gene expressions. The idea in this paper is to utilize the genetic programming symbolic classifier (GPSC) on the publicly available dataset to obtain a set of symbolic expressions (SEs) that can classify the breast cancer sub-type using gene expressions with high classification accuracy. The initial problem with the used dataset is a large number of input variables (54,676 gene expressions), a small number of dataset samples (151 samples), and six classes of breast cancer sub-types that are highly imbalanced. The large number of input variables is solved with principal component analysis (PCA), while the small number of samples and the large imbalance between class samples are solved with the application of different oversampling methods generating different dataset variations. On each oversampled dataset, the GPSC with random hyperparameter values search (RHVS) method is trained using 5-fold cross validation (5CV) to obtain a set of SEs. The best set of SEs is chosen based on mean values of accuracy (ACC), the area under the receiving operating characteristic curve (AUC), precision, recall, and F1-score values. In this case, the highest classification accuracy is equal to 0.992 across all evaluation metric methods. The best set of SEs is additionally combined with a decision tree classifier, which slightly improves ACC to 0.994.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Automated breast cancer detection in mammography using ensemble classifier and feature weighting algorithms
    Yan, Fei
    Huang, Hesheng
    Pedrycz, Witold
    Hirota, Kaoru
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
  • [42] Development of a knowledge based decision tree classifier using hybrid polarimetric SAR observables
    Verma, Nidhi
    Mishra, Pooja
    Purohit, Neetesh
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2020, 41 (04) : 1302 - 1320
  • [43] Breast Cancer Classification using Decision Tree Algorithms
    Tarawneh, Omar
    Otair, Mohammed
    Husni, Moath
    Abuaddous, Hayfa Y.
    Tarawneh, Monther
    Almomani, Malek A.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 676 - 680
  • [44] Mitotic nuclei analysis in breast cancer histopathology images using deep ensemble classifier
    Sohail, Anabia
    Khan, Asifullah
    Nisar, Humaira
    Tabassum, Sobia
    Zameer, Aneela
    [J]. MEDICAL IMAGE ANALYSIS, 2021, 72
  • [45] Application of decision tree-based ensemble learning in the classification of breast cancer
    Ghiasi, Mohammad M.
    Zendehboudi, Sohrab
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 128
  • [46] An evolutionary approach for spatial prediction of landslide susceptibility using LiDAR and symbolic classification with genetic programming
    Gorsevski, Pece V.
    [J]. NATURAL HAZARDS, 2021, 108 (02) : 2283 - 2307
  • [47] An evolutionary approach for spatial prediction of landslide susceptibility using LiDAR and symbolic classification with genetic programming
    Pece V. Gorsevski
    [J]. Natural Hazards, 2021, 108 : 2283 - 2307
  • [48] An efficient classification framework for breast cancer using hyper parameter tuned Random Decision Forest Classifier and Bayesian Optimization
    Kumar, Pratheep P.
    Bai, Mary Amala, V
    Nair, Geetha G.
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 68 (68)
  • [49] Classification of epileptiform EEG using a hybrid system based on decision tree classifier and fast Fourier transform
    Polat, Kemal
    Guenes, Salih
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2007, 187 (02) : 1017 - 1026
  • [50] Ensemble classifier for improve diagnosis of the breast cancer using optical coherence tomography and machine learning
    Dubey, Kavita
    Singla, Neeru
    Butola, Ankit
    Lathe, Astitwa
    Quaiser, Darakhshan
    Srivastava, Anurag
    Mehta, Dalip Singh
    Srivastava, Vishal
    [J]. LASER PHYSICS LETTERS, 2019, 16 (02)