Development of Symbolic Expressions Ensemble for Breast Cancer Type Classification Using Genetic Programming Symbolic Classifier and Decision Tree Classifier

被引:6
|
作者
Andelic, Nikola [1 ]
Baressi Segota, Sandi [1 ]
机构
[1] Univ Rijeka, Fac Engn, Dept Automat & Elect, Vukovarska 58, Rijeka 51000, Croatia
关键词
breast cancer; genetic programming symbolic classifier; 5-fold cross validation; random hyperparameter value search; FEATURE-SELECTION ALGORITHM; OPTIMIZATION;
D O I
10.3390/cancers15133411
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Simple Summary Breast cancer is a type of cancer with several sub-types and correct sub-type classification based on a large number of gene expressions is challenging even for artificial intelligence. However, the accurate classification of breast cancer in a patient is mandatory for the application of proper treatment. To obtain the equations that can be used for accurate classification of breast cancer sub-type the genetic programming symbolic classifier was utilized. A large number of input variables (gene expressions) were reduced using principle component analysis and the imbalance between class samples was solved using different oversampling methods. The proposed procedure generated equations that can classify breast cancer sub-types with high classification accuracy which was slightly improved with the application of the decision tree classifier method. Breast cancer is a type of cancer with several sub-types. It occurs when cells in breast tissue grow out of control. The accurate sub-type classification of a patient diagnosed with breast cancer is mandatory for the application of proper treatment. Breast cancer classification based on gene expression is challenging even for artificial intelligence (AI) due to the large number of gene expressions. The idea in this paper is to utilize the genetic programming symbolic classifier (GPSC) on the publicly available dataset to obtain a set of symbolic expressions (SEs) that can classify the breast cancer sub-type using gene expressions with high classification accuracy. The initial problem with the used dataset is a large number of input variables (54,676 gene expressions), a small number of dataset samples (151 samples), and six classes of breast cancer sub-types that are highly imbalanced. The large number of input variables is solved with principal component analysis (PCA), while the small number of samples and the large imbalance between class samples are solved with the application of different oversampling methods generating different dataset variations. On each oversampled dataset, the GPSC with random hyperparameter values search (RHVS) method is trained using 5-fold cross validation (5CV) to obtain a set of SEs. The best set of SEs is chosen based on mean values of accuracy (ACC), the area under the receiving operating characteristic curve (AUC), precision, recall, and F1-score values. In this case, the highest classification accuracy is equal to 0.992 across all evaluation metric methods. The best set of SEs is additionally combined with a decision tree classifier, which slightly improves ACC to 0.994.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Classification of Wall Following Robot Movements Using Genetic Programming Symbolic Classifier
    Andelic, Nikola
    Segota, Sandi Baressi
    Glucina, Matko
    Lorencin, Ivan
    [J]. MACHINES, 2023, 11 (01)
  • [2] The Development of Symbolic Expressions for Fire Detection with Symbolic Classifier Using Sensor Fusion Data
    Andelic, Nikola
    Segota, Sandi Baressi
    Lorencin, Ivan
    Car, Zlatan
    [J]. SENSORS, 2023, 23 (01)
  • [3] Enhancing Network Intrusion Detection: A Genetic Programming Symbolic Classifier Approach
    Andelic, Nikola
    Baressi Segota, Sandi
    [J]. INFORMATION, 2024, 15 (03)
  • [4] Classification of Faults Operation of a Robotic Manipulator Using Symbolic Classifier
    Andelic, Nikola
    Baressi Segota, Sandi
    Glucina, Matko
    Lorencin, Ivan
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [5] Gas Classification Using Binary Decision Tree Classifier
    Hassan, Muhammad
    Bermak, Amine
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2014, : 2579 - 2582
  • [6] Achieving High Accuracy in Android Malware Detection through Genetic Programming Symbolic Classifier
    Andelic, Nikola
    Baressi Segota, Sandi
    [J]. COMPUTERS, 2024, 13 (08)
  • [7] The Development of Symbolic Expressions for the Detection of Hepatitis C Patients and the Disease Progression from Blood Parameters Using Genetic Programming-Symbolic Classification Algorithm
    Andelic, Nikola
    Lorencin, Ivan
    Segota, Sandi Baressi
    Car, Zlatan
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [8] A Cost-sensitive Ensemble Classifier for Breast Cancer Classification
    Krawczyk, Bartosz
    Schaefer, Gerald
    Wozniak, Michal
    [J]. 2013 IEEE 8TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2013), 2013, : 427 - 430
  • [9] Priority Based Decision Tree Classifier for Breast Cancer Detection
    Hamsagayathri, P.
    Sampath, P.
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2017,
  • [10] A Study of Decision Tree Induction for Data Stream Mining Using Boosting Genetic Programming Classifier
    Kumar, Dirisala J. Nagendra
    Murthy, J. V. R.
    Satapathy, Suresh Chandra
    Pullela, S. V. V. S. R. Kumar
    [J]. SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, PT I, 2011, 7076 : 315 - +